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Abstract 

The ill-posedness of the inverse problem of recovering a regression function in 
a nonparametric instrumental variable model leads to estimators that may suffer 
from a very slow, logarithmic rate of convergence. In this paper, we show that 
restricting the problem to models with monotone regression functions and mono¬ 
tone instruments significantly weakens the ill-posedness of the problem. In stark 
contrast to the existing literature, the presence of a monotone instrument implies 
boundedness of our measure of ill-posedness when restricted to the space of mono¬ 
tone functions. Based on this result we derive a novel non-asymptotic error bound 
for the constrained estimator that imposes monotonicity of the regression function. 
For a given sample size, the bound is independent of the degree of ill-posedness 
as long as the regression function is not too steep. As an implication, the bound 
allows us to show that the constrained estimator converges at a fast, polynomial 
rate, independently of the degree of ill-posedness, in a large, but slowly shrinking 
neighborhood of constant functions. Our simulation study demonstrates signifi¬ 
cant finite-sample performance gains from imposing monotonicity even when the 
regression function is rather far from being a constant. We apply the constrained 
estimator to the problem of estimating gasoline demand functions from U.S. data. 
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1 Introduction 


Despite the pervasive use of linear instrumental variable methods in empirical research, 
their nonparametric counterparts are far from enjoying similar popularity. Perhaps two 
of the main reasons for this originate from the observation that point-identihcation of the 
regression function in the nonparametric instrumental variable (NPIV) model requires 
completeness assumptions, which have been argued to be strong (Santos (2012)) and non- 
testable (Canay, Santos, and Shaikh (2013)), and from the fact that the NPIV model is 
ill-posed, which may cause regression function estimators in this model to suffer from a 
very slow, logarithmic rate of convergence (e.g. Blundell, Chen, and Kristensen (2007)). 

In this paper, we explore the possibility of imposing shape restrictions to improve 
statistical properties of the NPIV estimators and to achieve (partial) identihcation of the 
NPIV model in the absence of completeness assumptions. We study the NPIV model 

Y = g{X)+e, E[£|iy] = 0, (1) 

where V is a dependent variable, X an endogenous regressor, and W an instrumental 
variable (IV). We are interested in identihcation and estimation of the nonparametric re¬ 
gression function g based on a random sample of size n from the distribution of (V, X, W). 
We impose two monotonicity conditions: (i) monotonicity of the regression function g (we 
assume that g is increasing^) and (ii) monotonicity of the reduced form relationship be¬ 
tween the endogenous regressor X and the instrument W in the sense that the conditional 
distribution of X given W corresponding to higher values of W hrst-order stochastically 
dominates the same conditional distribution corresponding to lower values of W (the 
monotone IV assumption). 

We show that these two monotonicity conditions together signihcantly change the 
structure of the NPIV model, and weaken its ill-posedness. In particular, we demonstrate 
that under the second condition, a slightly modihed version of the sieve measure of ill- 
posedness dehned in Blundell, Chen, and Kristensen (2007) is bounded uniformly over 
the dimension of the sieve space, when restricted to the set of monotone functions; see 
Section 2 for details. As a result, under our two monotonicity conditions, the constrained 
NPIV estimator that imposes monotonicity of the regression function g possesses a fast 
rate of convergence in a large but slowly shrinking neighborhood of constant functions. 

More specihcally, we derive a new non-asymptotic error bound for the constrained 
estimator. The bound exhibits two regimes. The hrst regime applies when the function 
g is not too steep, and the bound in this regime is independent of the sieve measure of 

^All results in the paper hold also when g is decreasing. In fact, as we show in Section 4 the sign of 
the slope of g is identified under our monotonicity conditions. 
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ill-posedness, which slows down the convergence rate of the nnconstrained estimator. In 
fact, nnder some farther conditions, the bonnd in the hrst regime takes the following form: 
with high probability. 



where ^ is the constrained estimator, || ■ || 2 ,t an appropriate L^-norm, K the nnmber of 
series terms in the estimator s the nnmber of derivatives of the fnnction g, and C some 
constant; see Section 3 for details. Thns, the constrained estimator ^ has fast rate of 
convergence in the hrst regime, and the bonnd in this regime is of the same order, np to a 
log-factor, as that for series estimators of conditional mean fnnctions. The second regime 
applies when the fnnction g is snfhciently steep. In this regime, the bonnd is similar to that 
for the nnconstrained NPIV estimators. The steepness level separating the two regimes 
depends on the sample size n and decreases as the sample size n grows large. Therefore, 
for a given increasing fnnction g^ if the sample size n is not too large, the bonnd is in its 
hrst regime, where the constrained estimator ^ does not snher from ill-posedness of the 
model. As the sample size n grows large, however, the bonnd eventnally switches to the 
second regime, where ill-posedness of the model nndermines the statistical properties of 
the constrained estimator 'g^ similarly to the case of the nnconstrained estimator. 

Intnitively, existence of the second regime of the bonnd is well expected. Indeed, 
if the fnnction g is strictly increasing, it lies in the interior of the constraint that g is 
increasing. Hence, the constraint does not bind asymptotically so that, in snfhciently 
large samples, the constrained estimator coincides with the nnconstrained one and the 
two estimators share the same convergence rate. In finite samples, however, the constraint 
binds with non-negligible probability even if g is strictly increasing. The hrst regime of onr 
non-asymptotic bonnd captnres this hnite-sample phenomenon, and improvements from 
imposing the monotonicity constraint on g in this regime can be nnderstood as a bonndary 
ehect. Importantly, and perhaps nnexpectedly, we show that nnder the monotone IV 
assnmption, this bonndary ehect is so strong that ill-posedness of the problem completely 
disappears in the hrst regime.^ In addition, we demonstrate via onr analytical resnlts 
as well as simnlations that this bonndary ehect can be strong even far away from the 
bonndary and/or in large samples. 

Our simulation experiments conhrm these theoretical hndings and demonstrate dra¬ 
matic hnite-sample performance improvements of the constrained relative to the un¬ 
constrained NPIV estimator when the monotone IV assumption is satished. Imposing 
the monotonicity constraint on g removes the estimator’s non-monotone oscillations due 

^Even though we have established the result that ill-posedness disappears in the hrst regime under the 
monotone IV assumption, currently we do not know whether this assumption is necessary for the result. 
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to sampling noise, which in ill-posed inverse problems can be particularly pronounced. 
Therefore, imposing the monotonicity constraint significantly reduces variance while only 
slightly increasing bias. 

In addition, we show that in the absence of completeness assumptions, that is, when 
the NPIV model is not point-identihed, our monotonicity conditions have non-trivial 
identihcation power, and can provide partial identihcation of the model. 

We regard both monotonicity conditions as natural in many economic applications. 
In fact, both of these conditions often directly follow from economic theory. Consider the 
following generic example. Suppose an agent chooses input X (e.g. schooling) to produce 
an outcome Y (e.g. life-time earnings) such that Y = g{X) + e, where £ summarizes 
determinants of outcome other than X. The cost of choosing a level X = x is C{x, W, rj), 
where W is a cost-shifter (e.g. distance to college) and rj represents (possibly vector¬ 
valued) unobserved heterogeneity in costs (e.g. family background, a family’s taste for 
education, variation in local infrastructure). The agent’s optimization problem can then 
be written as 

X = argmax{ 5 f(x) Ye — c{x, W, rj)} 

X 

so that, from the hrst-order condition of this optimization problem. 


dX 

dW 


a^c 

axaw \ 

a^g _ a'^c — 

ax2 9x2 


( 2 ) 


if marginal cost are decreasing in W (i.e. d^c/dXdW < 0), marginal cost are increasing 
in X (i.e. d‘^c/dX^ > 0), and the production function is concave (i.e. d‘^g/dX‘^ < 0). 
As long as W is independent of the pair {e,r]), condition (2) implies our monotone IV 
assumption and g increasing corresponds to the assumption of a monotone regression 
function. Dependence between rj and e generates endogeneity of X, and independence of 
W from {e, g) implies that W can be used as an instrument for X. 

Another example is the estimation of Engel curves. In this case, the outcome variable 
Y is the budget share of a good, the endogenous variable X is total expenditure, and 
the instrument W is gross income. Our monotonicity conditions are plausible in this 
example because for normal goods such as food-in, the budget share is decreasing in 
total expenditure, and total expenditure increases with gross income. Finally, consider 
the estimation of (Marshallian) demand curves. The outcome variable Y is quantity of 
a consumed good, the endogenous variable X is the price of the good, and W could be 
some variable that shifts production cost of the good. For a normal good, the Slutsky 
inequality predicts Y to be decreasing in price X as long as income effects are not too 
large. Furthermore, price is increasing in production cost and, thus, increasing in the 
instrument W, and so our monotonicity conditions are plausible in this example as well. 
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Both of our monotonicity assumptions are testable. For example, a test of the mono¬ 
tone IV condition can be found in Lee, Linton, and Whang (2009). In this paper, we 
extend their results by deriving an adaptive test of the monotone IV condition, with the 
value of the involved smoothness parameter chosen in a data-driven fashion. This adap¬ 
tation procedure allows us to construct a test with desirable power properties when the 
degree of smoothness of the conditional distribution of X given W is unknown. Regarding 
our hrst monotonicity condition, to the best of our knowledge, there are no procedures in 
the literature that consistently test monotonicity of the function g in the NPIV model (1). 
We consider such procedures in a separate project and, in this paper, propose a simple 
test of monotonicity of g given that the monotone IV condition holds. 

Matzkin (1994) advocates the use of shape restrictions in econometrics and argues that 
economic theory often provides restrictions on functions of interest, such as monotonicity, 
concavity, and/or Slutsky symmetry. In the context of the NPIV model (1), Freyberger 
and Horowitz (2013) show that, in the absence of point-identihcation, shape restrictions 
may yield informative bounds on functionals of g and develop inference procedures when 
the regressor X and the instrument W are discrete. Blundell, Horowitz, and Parey (2013) 
demonstrate via simulations that imposing Slutsky inequalities in a quantile NPIV model 
for gasoline demand improves hnite-sample properties of the NPIV estimator. Grasmair, 
Scherzer, and Vanhems (2013) study the problem of demand estimation imposing vari¬ 
ous constraints implied by economic theory, such as Slutsky inequalities, and derive the 
convergence rate of a constrained NPIV estimator under an abstract projected source 
condition. Our results are different from theirs because we focus on non-asymptotic error 
bounds, with special emphasis on properties of our estimator in the neighborhood of the 
boundary, we derive our results under easily interpretable, low level conditions, and we 
hnd that our estimator does not suffer from ill-posedness of the problem in a large but 
slowly shrinking neighborhood of constant functions. 

Other related literature. The NPIV model has received substantial attention in the 
recent econometrics literature. Newey and Powell (2003), Hall and Horowitz (2005), Blun¬ 
dell, Chen, and Kristensen (2007), and Darolles, Fan, Florens, and Renault (2011) study 
identihcation of the NPIV model (1) and propose estimators of the regression function 
g. See Horowitz (2011, 2014) for recent surveys and further references. In the mildly 
ill-posed case. Hall and Horowitz (2005) derive the minimax risk lower bound in L^-norm 
and show that their estimator achieves this lower bound. Under different conditions, 
Chen and Reifi (2011) derive a similar bound for the mildly and the severely ill-posed 
case and show that the estimator by Blundell, Chen, and Kristensen (2007) achieves this 
bound. Chen and Christensen (2013) establish minimax risk bounds in the sup-norm. 
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again both for the mildly and the severely ill-posed case. The optimal convergence rates 
in the severely ill-posed case were shown to be logarithmic, which means that the slow 
convergence rate of existing estimators is not a dehciency of those estimators but rather 
an intrinsic feature of the statistical inverse problem. 

There is also large statistics literature on nonparametric estimation of monotone func¬ 
tions when the regressor is exogenous, i.e. W = X, so that 5 ^ is a conditional mean func¬ 
tion. This literature can be traced back at least to Brunk (1955). Surveys of this literature 
and further references can be found in Yatchew (1998), Delecroix and Thomas-Agnan 
(2000), and Gijbels (2004). For the case in which the regression function is both smooth 
and monotone, many different ways of imposing monotonicity on the estimator have 
been studied; see, for example, Mukerjee (1988), Cheng and Lin (1981), Wright (1981), 
Friedman and Tibshirani (1984), Ramsay (1988), Mammen (1991), Ramsay (1998), Mam- 
men and Thomas-Agnan (1999), Hall and Huang (2001), Mammen, Marron, Turlach, and 
Wand (2001), and Dette, Neumeyer, and Pilz (2006). Importantly, under the mild assump¬ 
tion that the estimators consistently estimate the derivative of the regression function, 
the standard unconstrained nonparametric regression estimators are known to be mono¬ 
tone with probability approaching one when the regression function is strictly increasing. 
Therefore, such estimators have the same rate of convergence as the corresponding con¬ 
strained estimators that impose monotonicity (Mammen (1991)). As a consequence, gains 
from imposing a monotonicity constraint can only be expected when the regression func¬ 
tion is close to the boundary of the constraint and/or in hnite samples. Zhang (2002) 
and Chatterjee, Guntuboyina, and Sen (2013) formalize this intuition by deriving risk 
bounds of the isotonic (monotone) regression estimators and showing that these bounds 
imply fast convergence rates when the regression function has flat parts. Our results are 
different from theirs because we focus on the endogenous case with W ^ X and study 
the impact of monotonicity constraints on the ill-posedness property of the NPIV model 
which is absent in the standard regression problem. 

Notation. For a differentiable function / : M —>■ M, we use Df{x) to denote its deriva¬ 
tive. When a function / has several arguments, we use D with an index to denote the 
derivative of / with respect to corresponding argument; for example, D^f{w,u) denotes 
the partial derivative of / with respect to w. For random variables A and B, we denote by 
b), fA\B{o,, b), and /A(a) the joint, conditional and marginal densities of (A, B), A 
given B, and A, respectively. Similarly, we let Fa,b{.o,, b), Fa\b{cl, b), and Fa^o) refer to the 
corresponding cumulative distribution functions. For an operator T : L^[0,1] —)■ L^[0,1], 
we let ||T ||2 denote the operator norm dehned as 

llTlh = sup llThIh. 

/ieL2[0,l]: ||ft||2=l 
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Finally, by increasing and decreasing we mean that a function is non-decreasing and 
non-increasing, respectively. 

Outline. The remainder of the paper is organized as follows. In the next section, we 
analyze ill-posedness of the model (1) under our monotonicity conditions and derive a 
useful bound on a restricted measure of ill-posedness for the model (1). Section 3 discusses 
the implications of our monotonicity assumptions for estimation of the regression function 
g. In particular, we show that the rate of convergence of our estimator is always not worse 
than that of unconstrained estimators but may be much faster in a large, but slowly 
shrinking, neighborhood of constant functions. Section 4 shows that our monotonicity 
conditions have non-trivial identihcation power. Section 5 provides new tests of our two 
monotonicity assumptions. In Section 6, we present results of a Monte Carlo simulation 
study that demonstrates large gains in performance of the constrained estimator relative to 
the unconstrained one. Finally, Section 7 applies the constrained estimator to the problem 
of estimating gasoline demand functions. All proofs are collected in the appendix. 

2 Boundedness of the Measure of Ill-posedness under 
Monotonicity 

In this section, we discuss the sense in which the ill-posedness of the NPIV model (1) 
is weakened by imposing our monotonicity conditions. In particular, we introduce a 
restricted measure of ill-posedness for this model (see equation (9)) and show that, in 
stark contrast to the existing literature, our measure is bounded (Corollary 1) when the 
monotone IV condition holds. 

The NPIV model requires solving the equation E[V|hF] = E[ 5 f(X)|hF] for the func¬ 
tion g. Letting T : L^[0,1] —?• L^[0,1] be the linear operator dehned by (Th){w) : = 
'Ei[h{X)\W = w\fw{w) and denoting m{w) := E[V|hF = w\fw{w), we can express this 
equation as 

Tg = m. (3) 

In hnite-dimensional regressions, the operator T corresponds to a hnite-dimensional ma¬ 
trix whose singular values are typically assumed to be nonzero (rank condition). There¬ 
fore, the solution g is continuous in m, and consistent estimation of m at a fast con¬ 
vergence rate leads to consistent estimation of g at the same fast convergence rate. In 
inhnite-dimensional models, however, T is an operator that, under weak conditions, pos¬ 
sesses inhnitely many singular values that tend to zero. Therefore, small perturbations in 
m may lead to large perturbations in g. This discontinuity renders equation (3) ill-posed 
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and introduces challenges in estimation of the NPIV model (1) that are not present in 
parametric regressions nor in nonparametric regressions with exogenous regressors; see 
Horowitz (2011, 2014) for a more detailed discussion. 

In this section, we show that, under our monotonicity conditions, there exists a hnite 
constant C such that for any monotone function g' and any constant function g" ^ with 
m' = Tg' and m” = Tg" , we have 

\\9'-9"h,<C\\m'-m"h, 

where || ■ || 2 ,t is a truncated L^-norm dehned below. This result plays a central role in our 
derivation of the upper bound on the restricted measure of ill-posedness, of identihcation 
bounds, and of fast convergence rates of a constrained NPIV estimator that imposes 
monotonicity of in a large but slowly shrinking neighborhood of constant functions. 

We now introduce our assumptions. Let Q < Xi < X\ < X 2 < ^2 < 1 and 0 < Wi < 
1(72 < 1 be some constants. We implicitly assume that Xi, Xi, and Wi are close to 0 whereas 
X 2 , X 2 , and W 2 are close to 1. Our hrst assumption is the monotone IV condition that 
requires a monotone relationship between the endogenous regressor X and the instrument 
W. 

Assumption 1 (Monotone IV). For all x,w',w" G (0,1), 

w' < w" ^ Fx|rv(x|ia') > Fx\w{.x\'^'')- (4) 

Furthermore, there exists a constant Cp > ^ such that 

Fx\w{x\wi) > CfFx\w{x\w 2 ), Vx G (0,X2) (5) 

and 

Cf{ 1-Fx\w{x\wi)) <1 -Fx\w{x\w2), Vx G (xi, 1) (6) 

Assumption 1 is crucial for our analysis. The hrst part, condition (4), requires hrst- 
order stochastic dominance of the conditional distribution of the endogenous regressor X 
given the instrument W as we increase the value of the instrument W. This condition 
(4) is testable; see, for example, Lee, Linton, and Whang (2009). In Section 5 below, we 
extend the results of Lee, Linton, and Whang (2009) by providing an adaptive test of the 
hrst-order stochastic dominance condition (4). 

The second and third parts of Assumption 1, conditions (5) and (6), strengthen the 
stochastic dominance condition (4) in the sense that the conditional distribution is re¬ 
quired to “shift to the right” by a strictly positive amount at least between two values of 
the instrument, Wi and W 2 , so that the instrument is not redundant. Conditions (5) and 


(6) are rather weak as they require such a shift only in some intervals ( 0 ,X 2 ) and (xi, 1), 
respectively. 

Condition (4) can be equivalently stated in terms of monotonicity with respect to the 
instrument W of the reduced form hrst stage function. Indeed, by the Skorohod repre¬ 
sentation, it is always possible to construct a random variable 1/ distributed uniformly on 
[0,1] such that 1/ is independent of W, and equation X = r(fh, [/) holds for the reduced 
form hrst stage function r{w,u) := F^^yy{u\w) := infja; : Fx\w{x\w) > u}. Therefore, 
condition (4) is equivalent to the assumption that the function w h-)■ r{w,u) is increasing 
for all u G [0,1]. Notice, however, that our condition (4) allows for general unobserved 
heterogeneity of dimension larger than one, for instance as in Example 2 below. 

Condition (4) is related to a corresponding condition in Kasy (2014) who assumes 
that the (structural) hrst stage has the form X = r{W, U) where U, representing (poten¬ 
tially multidimensional) unobserved heterogeneity, is independent of W, and the function 
w I—)■ r{w,u) is increasing for all values u. Kasy employs his condition for identihca- 
tion of (nonseparable) triangular systems with multidimensional unobserved heterogene¬ 
ity whereas we use our condition (4) to derive a useful bound on the restricted measure 
of ill-posedness and to obtain a fast rate of convergence of a monotone NPIV estima¬ 
tor of g in the (separable) model (1). Condition (4) is not related to the monotone IV 
assumption in the inhuential work by Manski and Pepper (2000) which requires the func¬ 
tion w H-)■ E[e:|kP = w] to be increasing. Instead, we maintain the mean independence 
condition E[e|kP] =0. 

Assumption 2 (Density), (i) The joint distribution of the pair (X,W) is absolutely 
continuous with respect to the Lebesgue measure on [0,1]^ with the density fx,w{x,w) 
satisfying fx,wix,w)'^dxdw < Ct for some finite constant Ct- (ii) There exists a 
constant Cf > 0 such that fx\w{.x\'^) — c/ for all x G [xi^xfl and w G {wi,W 2 }. (Hi) 
There exists constants 0 < cw < Cw < oo such that cw < fw{w) < Cw for all w G [0,1]. 

This is a mild regularity assumption. The hrst part of the assumption implies that 
the operator T is compact. The second and the third parts of the assumption require the 
conditional distribution of X given W = Wi oi W 2 and the marginal distribution of W to 
be bounded away from zero over some intervals. Recall that we have 0 < Xi < X 2 < 1 
and 0 < tci < tC 2 < 1- We could simply set [xi,X 2 ] = [tci,tC 2 ] = [0,1] in the second part 
of the assumption but having 0 < xi < X 2 < 1 and 0 < wi < tC 2 < 1 is required to allow 
for densities such as the normal, which, even after a transformation to the interval [0,1], 
may not yield a conditional density fxiwi^l^) bounded away from zero; see Example 1 
below. Therefore, we allow for the general case 0 < xi < X 2 < 1 and 0 < tci < ^2 < 1- 
The restriction fw{w) < Cw for all w G [0,1] imposed in Assumption 2 is not actually 
required for the results in this section, but rather those of Section 3. 


9 


We now provide two examples of distributions of (X, W) that satisfy Assumptions 1 
and 2, and show two possible ways in which the instrument W can shift the conditional 
distribution of X given W. Figure 1 displays the corresponding conditional distributions. 

Example 1 (Normal density). Let {X,W) be jointly normal with mean zero, variance 
one, and correlation 0 < p < 1. Let $(m) denote the distribution function of a X(0,1) 
random variable. Dehne X = $(X) and W = $(fF). Since X = pW + (1 — for 

some standard normal random variable U that is independent of W, we have 

X = $(p$-^(hF) + (1 - pY^^^U) 


where U is independent of W. Therefore, the pair (X, W) satishes condition (4) of our 
monotone IV Assumption 1. Lemma 7 in the appendix verihes that the remaining condi¬ 
tions of Assumption 1 as well as Assumption 2 are also satished. □ 

Example 2 (Two-dimensional unobserved heterogeneity). Let X = Ui + U 2 W, where 
Ui,U 2 ,W are mutually independent, Ui,U 2 ~ [/[0,l/2] and W ~ f/[0,1]. Since U 2 
is positive, it is straightforward to see that the stochastic dominance condition (4) is 
satished. Lemma 8 in the appendix shows that the remaining conditions of Assumption 1 
as well as Assumption 2 are also satished. □ 


Figure 1 shows that, in Example 1, the conditional distribution at two diherent values 
of the instrument is shifted to the right at every value of X, whereas, in Example 2, the 
conditional support of X given W = w changes with w, but the positive shift in the cdf 
of X|hF = w occurs only for values of X in a subinterval of [0,1]. 

Before stating our results in this section, we introduce some additional notation. Dehne 
the truncated L^-norm || ■ || 2 ,t by 


\2,t 



h e l2[o,i]. 


Also, let Xi denote the set of all monotone functions in L^[0,1]. Finally, dehne ( : = 
{cf, cw, C'f, Ct, Wi, W 2 , Xi, X 2 , Xi, X 2 )- Below is our hrst main result in this section. 

Theorem 1 (Lower Bound on T). Let Assumptions 1 and 2 he satisfied. Then there 
exists a finite constant C depending only on ( such that 


\\h\\2,t < C\\Th\\2 


( 7 ) 


for any function h & Ai. 
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To prove this theorem, we take a function h & M. with ||h|| 2 ,t = 1 and show that 
||rh ||2 is bounded away from zero. A key observation that allows us to establish this 
bound is that, under monotone IV Assumption 1, the function w h->■ 'Ej[h{X)\W = w\ is 
monotone whenever h is. Together with non-redundancy of the instrument W implied 
by conditions (5) and (6) of Assumption 1, this allows us to show that E[h{X)\W = tci] 
and E[h{X)\W = tC 2 ] cannot both be close to zero so that ||E[h(X)|lT = ■]||2 is bounded 
from below by a strictly positive constant from the values of E[h{X)\W = w\ in the 
neighborhood of either wi or W 2 - By Assumption 2, ||Th ||2 must then also be bounded 
away from zero. 

Theorem 1 has an important consequence. Indeed, consider the linear equation (3). 
By Assumption 2(i), the operator T is compact, and so 

->■ oo as A; —)• oo for some sequence {hk, A; > 1} C T^[0,1]. (8) 

\\Thk\\2 


Property (8) means that ||Th ||2 being small does not necessarily imply that ||h ||2 is small 
and, therefore, the inverse of the operator T : T^[0,1] —>■ T^[0,1], when it exists, cannot 
be continuous. Therefore, (3) is ill-posed in Hadamard’s sense^, if no other conditions are 
imposed. This is the main reason why standard NPIV estimators have (potentially very) 
slow rate of convergence. Theorem 1, on the other hand, implies that, under Assump¬ 
tions 1 and 2, (8) is not possible if hk belongs to the set Xi of monotone functions in 
L^[0,1] for all A; > 1 and we replace the L^-norm 11-112 in the numerator of the left-hand 
side of (8) by the truncated L^-norm || ■ || 2 ,t, indicating that shape restrictions may be 
helpful to improve statistical properties of the NPIV estimators. Also, in Remark 1, we 
show that replacing the norm in the numerator is not a signihcant modihcation in the 
sense that for most ill-posed problems, and in particular for all severely ill-posed prob¬ 
lems, (8) holds even if we replace L^-norm 11-112 in the numerator of the left-hand side of 
(8) by the truncated L^-norm || - || 2 ,t. 

Next, we derive an implication of Theorem 1 for the (quantitative) measure of ill- 
posedness of the model (1). We hrst dehne the restricted measure of ill-posedness. For 
a G M, let 


n{a) := 


heL‘^\0,l]- inf 


h{x") — h{x') 


> —a 


^Well- and ill-posedness in Hadamard’s sense are defined as follows. Let A : D ^ Rhe a continuous 
mapping between metric spaces (D^pu) and {R^pr). Then, for d G D and r G i?, the equation Ad = r 
is called “well-posed” on D in Hadamard’s sense (see Hadamard (1923)) if (i) A is bijective and (ii) 
A~^ : R —>■ D is continuous, so that for each r G R there exists a unique d = A~^r G D satisfying Ad = r, 
and, moreover, the solution d = A~^r is continous in “the data” r. Otherwise, the equation is called 
“ill-posed” in Hadamard’s sense. 
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be the space containing all functions in L^[0,1] with lower derivative bounded from below 
by —a uniformly over the interval [0,1]. Note that ^{a') C 'H{a") whenever a' < a" 
and that 7/(0) is the set of increasing functions in L^[0,1]. For continuously differentiable 
functions, h G h^[0,1] belongs to l-i^a) if and only if inf 3 ;g[o,i] Dh{x) > —a. Further, dehne 
the restricted measure of ill-posedness: 


r(a) 


sup 

I|h|l2,t = l 



( 9 ) 


As we discussed above, under our Assumptions 1 and 2, r(oo) = oo if we use the L^-norm 
instead of the truncated L^-norm in the numerator in (9). We show in Remark 1 below, 
that t{oo) = oo for many ill-posed and, in particular, for all severely ill-posed problems 
even with the truncated L^-norm as dehned in (9). However, Theorem 1 implies that r(0) 
is bounded from above by C and, by dehnition, r(a) is increasing in a, i.e. T(a') < r(a") 
for a' < a". It turns out that r(a) is bounded from above even for some positive values 
of a: 


Corollary 1 (Bound for the Restricted Measure of Ill-Posedness). Let Assumptions 1 
and 2 be satisfied. Then there exist constants > 0 and 0 < < oo depending only on 

C such that 

T{a) < Cr (10) 


for all a <Cr. 


This is our second main result in this section. It is exactly this corollary of Theorem 
1 that allows us to obtain a fast convergence rate of our constrained NPIV estimator ffi 
not only when the regression function g is constant but, more generally, when g belongs 
to a large but slowly shrinking neighborhood of constant functions. 


Remark 1 (Ill-posedness is preserved by norm truncation). Under Assumptions 1 and 
2, the integral operator T satishes (8). Here we demonstrate that, in many cases, and in 
particular in all severely ill-posed cases, (8) continues to hold if we replace the L^-norm 
II ■ II2 by the truncated L^-norm || ■ ||2,t in the numerator of the left-hand side of (8), that 
is, there exists a sequence {4, A: > 1} in L^[0,1] such that 


^4112 


—)■ oo as fc —)■ cx). 


( 11 ) 


Indeed, under Assumptions 1 and 2, T is compact, and so the spectral theorem implies that 
there exists a spectral decomposition of operator T, {(hj,(pj),j > 1}, where {hj, j > 1} 
is an orthonormal basis of L^[0,1] and > 1} is a decreasing sequence of positive 

numbers such that ipj —?■ 0 as j —)■ oo, and ||Thj ||2 = (pj||hj ||2 = (fj. Also, Lemma 6 in the 
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appendix shows that if {hj,j > 1} is an orthonormal basis in -h^[0,1], then for any a > 0, 
for inhnitely many j, and so there exists a subsequence > 1} 

such that ||hjj,|| 2 ,t > ■ Therefore, under a weak condition that —)■ 0 as 

j —)■ oo, using ||hjj ,||2 = 1 for all /c > 1, we conclude that for the subsequence Ik = hji^, 

Whjkh _ 1 


'•k\\2,t 


\\Tl 


> 


k\\2 


Jk 


1 / 2+0 


\\Th 


Jk I 




—>■ cx) as A; —)• oo 


leading to (11). Note also that the condition that —)■ 0 as j —)■ oo necessarily 

holds if there exists a constant c > 0 such that (pj < for all large j, that is, if the 
problem is severely ill-posed. Thus, under our Assumptions 1 and 2, the restriction in 
Theorem 1 that h belongs to the space Ad of monotone functions in T^[0,1] plays a crucial 
role for the result (7) to hold. On the other hand, whether the result (7) can be obtained 
for all h G Ad without imposing our monotone IV Assumption 1 appears to be an open 
(and interesting) question. □ 


Remark 2 (Severe ill-posedness is preserved by norm truncation). One might wonder 
whether our monotone IV Assumption 1 excludes all severely ill-posed problems, and 
whether the norm truncation signihcantly changes these problems. Here we show that 
there do exist severely ill-posed problems that satisfy our monotone IV Assumption 1, 
and also that severely ill-posed problems remain severely ill-posed even if we replace the 
L^-norm || ■ ||2 by the truncated L^-norm || ■ || 2 ,t. Indeed, consider Example 1 above. 
Because, in this example, the pair {X, W) is a transformation of the normal distribution, 
it is well known that the integral operator T in this example has singular values decreasing 
exponentially fast. More specihcally, the spectral decomposition {{hk,<Pk))k > 1} of the 
operator T satishes ipk = for all k and some p < 1. Hence, 

ll+lh /iV 

I|r+ll2 Vp/ ■ 


Since (1/p)^ —>■ oo as fc —)■ oo exponentially fast, this example leads to a severely ill-posed 
problem. Moreover, by Lemma 6, for any a > 0 and p' G (p, 1), 


\\h 


fc 2,t 


\\Th 


> 


k\\2 


kl/2+c 


^'7 


for inhnitely many k. Thus, replacing the LJ norm 11-112 by the truncated LJ norm || ■ || 2 ,t 
preserves the severe ill-posedness of the problem. However, it follows from Theorem 1 that 
uniformly over all h G Ad, ||h|| 2 ,t/||Th ||2 < C. Therefore, in this example, as well as in all 
other severely ill-posed problems satisfying Assumptions 1 and 2, imposing monotonicity 
on the function h G L^[0,1] signihcantly changes the properties of the ratio ||h|| 2 ,t/||Th|| 2 . 
□ 
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Remark 3 (Monotone IV Assumption does not imply control function approach). Our 
monotone IV Assumption 1 does not imply the applicability of a control function ap¬ 
proach to estimation of the function g. Consider Example 2 above. In this example, the 
relationship between X and W has a two-dimensional vector (f/i, U 2 ) of unobserved het¬ 
erogeneity. Therefore, by Proposition 4 of Kasy (2011), there does not exist any control 
function C : [0,1]^ —)■ M such that (i) C is invertible in its second argument, and (ii) 
X is independent of e conditional on V = C{X,W). As a consequence, our monotone 
IV Assumption 1 does not imply any of the existing control function conditions such as 
those in Newey, Powell, and Vella (1999) and Imbens and Newey (2009), for example.'^ 
Since multidimensional unobserved heterogeneity is common in economic applications 
(see Imbens (2007) and Kasy (2014)), we view our approach to avoiding ill-posedness as 
complementary to the control function approach. □ 

Remark 4 (On the role of norm truncation). Let us also briefly comment on the role 
of the truncated norm || ■ || 2 ,t in (7). There are two reasons why we need the truncated 
L^-norm || ■ || 2 ,t rather than the usual L^-norm || ■ II 2 . First, Lemma 2 in the appendix 
shows that, under Assumptions 1 and 2, there exists a constant 0 < 02 < cxd such that 

l|/^lli<^^2||Th|K 

for any increasing and continuously differentiable function h G L^[0,1]. This result does 
not require any truncation of the norms and implies boundedness of a measure of illposed- 
ness dehned in terms of L^[0, l]-norms: sup^jgj^ijo^ij^/iincreasmg ll^lli/ll^^lli- To extend this 
result to L^[0, l]-norms we need to introduce a positive, but arbitrarily small, amount of 
truncation at the boundaries, so that we have a control ||h|| 2 ,t < OHhUi for some constant 
C and all monotone functions h & M.. Second, we want to allow for the normal density 
as in Example 1, which violates condition (ii) of Assumption 2 if we set [xi,X 2 \ = [0,1]. 
□ 

Remark 5 (Bounds on the measure of ill-posedness via compactness). Another approach 
to obtain a result like (7) would be to employ compactness arguments. For example, let h > 
0 be some (potentially large) constant and consider the class of functions M. {b) consisting 
of all functions hin Ai such that ||h||oo = sup3,g[o,i] I^(2^) I < b. It is well known that the set 
M.{b) is compact under the L^-norm || ■ II 2 , and so, as long as T is invertible, there exists 
some C > 0 such that ||h ||2 < C||Th ||2 for all h G M.{b) since (i) T is continuous and (ii) 
any continuous function achieves its minimum on a compact set. This bound does not 
require the monotone IV assumption and also does not require replacing the L^-norm by 

■^It is easy to show that the existence of a control function does not imply our monotone IV condition 
either, so our and the control function approach rely on conditions that are non-nested. 
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the truncated L^-norm. Further, dehning T{a, b) := ll/^lh/llThlhfor 

all a > 0 and using the same arguments as those in the proof of Corollary 1 , one can show 
that there exist some hnite constants c, C* > 0 such that r(a, h) < C for all a < c. This 
(seemingly interesting) result, however, is not useful for bounding the estimation error of 
an estimator of g because, as the proof of Theorem 2 in the next section reveals, obtaining 
meaningful bounds would require a result of the form r(a, bn) < C for all a < c for some 
sequence {bn,n > 1} such that bn —)■ oo, even if we know that sup,j,g[o,i] < b and we 

impose this constraint on the estimator of g. In contrast, our arguments in Theorem 1, 
being fundamentally different, do lead to meaningful bounds on the estimation error of 
the constrained estimator of g. □ 

3 Non-asymptotic Risk Bounds Under Monotonicity 

The rate at which unconstrained NPIV estimators converge to g depends crucially on 
the so-called sieve measure of ill-posedness, which, unlike r(a), does not measure ill- 
posedness over the space 'H(a), but rather over the space 'Hn(oo), a hnite-dimensional 
(sieve) approximation to 'H(oo). In particular, the convergence rate is slower the faster 
the sieve measure of ill-posedness grows with the dimensionality of the sieve space T-Lnioo). 
The convergence rates can be as slow as logarithmic in the severely ill-posed case. Since 
by Corollary 1, our monotonicity assumptions imply boundedness of r(a) for some range 
of hnite values a, we expect these assumptions to translate into favorable performance of 
a constrained estimator that imposes monotonicity of g. This intuition is conhrmed by 
the novel non-asymptotic error bounds we derive in this section (Theorem 2). 

Let (Yi, Xi, hFj), i = 1,..., n, be an i.i.d. sample from the distribution of (Y, X, W). To 
dehne our estimator, we hrst introduce some notation. Let {pk{,x), k > 1} and {qk{w), k > 
1} be two orthonormal bases in L^[0,1]. For K = Kn > 1 and J = JnY Kn, denote 

p{x) := {pi{x),... ,Pk{x))' and q{w) := (gi(w),... ,gj(w))'. 

Let P := (p(Xi),... ,p{Xn)y and Q := (qiWi),... ,q(Wn)y. Similarly, stack all observa¬ 
tions on F in Y := (Yi,..., Y„)'. Let 'Hnici) be a sequence of hnite-dimensional spaces 
dehned by 

r K„ 

'Hn(a) := I h e 'H{a) : 36i,..., G M with h = ^ bjPj 

[ j=i 

which become dense in 'H(a) as n —)■ cxd. Throughout the paper, we assume that || 5 f ||2 < Cb 
where Cb is a large but hnite constant known by the researcher. We dehne two estimators 
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of g: the unconstrained estimator ^(x) := p{xy/3^ with 


/3“ := argmint,,K^|n||<c.(Y - P())'Q(Q'Q)-‘Q'(Y - Pi) (12) 

which is similar to the estimator dehned in Horowitz (2012) and a special case of the esti¬ 
mator considered in Blundell, Chen, and Kristensen (2007), and the constrained estimator 
^(x) := p{xyi3^ with 

:= argminbgRK,p(.),feg^„(o),||6||<c,(Y - P6)'Q(Q'Q)“^Q'(Y - Pb), (13) 


which imposes the monotonicity of g through the constraint p{-yb G 'Hn(O). 

To study properties of the two estimators we introduce a hnite-dimensional, or sieve, 
counterpart of the restricted measure of ill-posedness r(a) dehned in (9) and also recall 
the dehnition of the (unrestricted) sieve measure of ill-posedness. Specihcally, dehne the 
restricted and unrestricted sieve measures of ill-posedness and as 


' nU\ 


:= sup 

hGHn (a) 


2,t 


\\Thh 


and 


Tr,. : = 


sup 

/le'Hn(oo) 


\\Thh 


The sieve measure of ill-posedness dehned in Blundell, Chen, and Kristensen (2007) and 
also used, for example, in Horowitz (2012) is r„. Blundell, Chen, and Kristensen (2007) 
show that Tn is related to the singular values of T.® If the singular values converge 
to zero at the rate K~^ as K ^ oo, then, under certain conditions, diverges at a 
polynomial rate, that is Tn = 0{Ky). This case is typically referred to as “mildly ill- 
posed” . On the other hand, when the singular values decrease at a fast exponential rate, 
then Tn = 0(e'^^'"), for some constant c > 0. This case is typically referred to as “severely 
ill-posed”. 

Our restricted sieve measure of ill-posedness r„,i(a) is smaller than the unrestricted 
sieve measure of ill-posedness because we replace the L^-norm in the numerator by the 
truncated L^-norm and the space 'Hn{oo) by Tinia). As explained in Remark 1, replacing 
the L^-norm by the truncated L^-norm does not make a crucial diherence but, as follows 
from Corollary 1, replacing 'Hn{oo) by T-Lnio) does. In particular, since r(a) < C-r for all 
a < O by Corollary 1, we also have Tn^tio.) < Ct for all a < Cr because Tn,t{.o) < T{a). 
Thus, for all values of a that are not too large, Tn^t{(^) remains bounded uniformly over 
all n, no matter how fast the singular values of T converge to zero. 

We now specify conditions that we need to derive non-asymptotic error bounds for the 
constrained estimator 'g^{x). 

®In fact, Blundell, Chen, and Kristensen (2007) talk about the eigenvalues of T*T, where T* is the 
adjoint of T but there is a one-to-one relationship between eigenvalues of T*T and singular values of T. 
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Assumption 3 (Monotone regression function). The function g is monotone increasing. 

Assumption 4 (Moments). For some constant Cb < oo, (i) E[£^|iy] < Cb and (ii) 
E[g{X)‘^\W]<CB. 

Assumption 5 (Relation between J and K). For some constant Cj < oo, J < CjK. 

Assumption 3, along with Assumption 1, is our main monotonicity condition. As¬ 
sumption 4 is a mild moment condition. Assumption 5 requires that the dimension of the 
vector q{w) is not much larger than the dimension of the vector p{x). Let s > 0 be some 
constant. 

Assumption 6 (Approximation of g). There exist G and a constant Cg < oo such 
that the function gn{x) := p{xy/3n, defined for all x G [0,1], satisfies (i) gn G Tiniff), (H) 
\\g - Qnh < CgK-y and (Hi) \\T{g - grfjh < CgT-^R-T 

The first part of this condition requires the approximating function g^ to be increasing. 
The second part requires a particular bound on the approximation error in the L^-norm. 
De Vore (1977a,b) show that the assumption H^f — gn \\2 < CgK~^ holds when the ap¬ 
proximating basis pi,... ,Pk consists of polynomial or spline functions and g belongs to 
a Holder class with smoothness level s. Therefore, approximation by monotone functions 
is similar to approximation by all functions. The third part of this condition is similar to 
Assumption 6 in Blundell, Chen, and Kristensen (2007). 

Assumption 7 (Approximation of m). There exist 7 „ G and a constant Cm < oo 
such that the function mn{w) := q^wY'jn, defined for all w G [0,1], satisfies \\m — m „||2 < 
CmTF^J-^ ■ 

This condition is similar to Assumption 3(iii) in Horowitz (2012). Also, define the 
operator T„ : L^[0,1] —)■ T^[0,1] by 

{T^h){w) := q{wyE[q{W)p{Xy]E[p{U)h{U)], w G [0,1] 

where U ~ U[0, 1]. 

Assumption 8 (Operator T). (i) The operator T is injective and (ii) for some constant 
Ca < oo, ||(T - T„)h ||2 < CaT-^K-y\h \\2 for all h G 'Rn(oo). 

This condition is similar to Assumption 5 in Horowitz (2012). Finally, let 

fK,p-= sup ||p(a;)||, 0,9 := sup ||g(w)||, O := max(^K,p,O,?)- 

*e[o,i] «)e[o,i] 

We start our analysis in this section with a simple observation that, if the function 
g is strictly increasing and the sample size n is sufficiently large, then the constrained 
estimator 0 coincides with the unconstrained estimator (jT, and the two estimators share 
the same rate of convergence. 
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Lemma 1 (Asymptotic equivalence of constrained and unconstrained estimators). Let 
Assumptions 1-8 he satisfied. In addition, assume that g is continuously differentiable 
and Dg{x) > Cg for all x G [0,1] o,nd some constant Cg > 0. If r^fflogn/n —)■ 0, 
suPxe[o,i] \\Dp{x)\\{TniK/nfi/^+K-^) 0, andsup,,g[o,i] \Dg{x)-Dgnix)\ 0 asn -)■ oo, 
then 

F(lfi{x) = g^{x) for all x G [0, l]j —)■ 1 as n ^ oo. (14) 

The result in Lemma 1 is similar to that in Theorem 1 of Mammen (1991), which 
shows equivalence (in the sense of (14)) of the constrained and unconstrained estimators 
of conditional mean functions. Lemma 1 implies that imposing monotonicity of g cannot 
lead to improvements in the rate of convergence of the estimator if g is strictly increasing. 
However, the result in Lemma 1 is asymptotic and only applies to the interior of the 
monotonicity constraint. It does not rule out faster convergence rates on or near the 
boundary of the monotonicity constraint nor does it rule out significant performance 
gains in hnite samples. In fact, our Monte Carlo simulation study in Section 6 shows 
significant hnite-sample performance improvements from imposing monotonicity even if 
g is strictly increasing and relatively far from the boundary of the constraint. Therefore, 
we next derive a non-asymptotic estimation error bound for the constrained estimator 
and study the impact of the monotonicity constraint on this bound. 


Theorem 2 (Non-asymptotic error bound for the constrained estimator). Let Assump¬ 
tions 1-8 he satisfied, and let 6 > 0 be some constant. Assume that ff^\ogn/n < c for 
sufficiently small c > 0. Then with probability at least 1 — a — n~^, we have 


nr - 9h. < cb+(It++ 

t \ 0 / \an n / 


K- 


(15) 


and 


W-9h,t < C'min<^ \\Dg\\„^ + (-h 

Kan 


(K fllogn yn /K bn log 


n 


,rn[ - + 

Kan 


(16) 


n 


Here the constants c,C<oo can be chosen to depend only on the constants appearing in 
Assumptions 1-8. 


This is the main result of this section. An important feature of this result is that 
since the constant C depends only on the constants appearing in Assumptions 1-8, the 
bounds (15) and (16) hold uniformly over all data-generating processes that satisfy those 
assumptions with the same constants. In particular, for any two data-generating processes 
in this set, the same hnite-sample bounds (15) and (16) hold with the same constant C, 
even though the unrestricted sieve measure of ill-posedness may be of different order 
of magnitude for these two data-generating processes. 
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Another important feature of the bound (15) is that it depends on the restricted sieve 
measure of ill-posedness that we know to be smaller than the unrestricted sieve measure 
of ill-posedness, appearing in the analysis of the unconstrained estimator. In particular, 
we know from Section 2 that r„^t(a) < r(a) and that, by Corollary 1, r(a) is uniformly 
bounded if a is not too large. Employing this result, we obtain the bound (16) of Theorem 
2.6 

The bound (16) has two regimes depending on whether the following inequality 




holds. The most interesting feature of this bound is that in the hrst regime, when the 
inequality (17) is satished, the bound is independent of the (unrestricted) sieve measure of 
ill-posedness r„, and can be small if the function g is not too steep, regardless of whether 
the original NPIV model (1) is mildly or severely ill-posed. This is the regime in which 
the bound relies upon the monotonicity constraint imposed on the estimator For a 
given sample size n, this regime is active if the function g is not too steep. 

As the sample size n grows large, the right-hand side of inequality (17) decreases (if 
K = Kn grows slowly enough) and eventually becomes smaller than the left-hand side, 
and the bound (16) switches to its second regime, in which it depends on the (unrestricted) 
sieve measure of ill-posedness r„. This is the regime in which the bound does not employ 
the monotonicity constraint imposed on However, since —>■ cxd, potentially at a very 
fast rate, even for relatively large sample sizes n and/or relatively steep functions g, the 
bound may be in its hrst regime, where the monotonicity constraint is important. The 
presence of the hrst regime and the observation that it is active in a (potentially very) 
large set of data generated processes provides a theoretical justihcation for the importance 
of imposing the monotonicity constraint on the estimators of the function g in the NPIV 
model (1) when the monotone IV Assumption 1 is satished. 

A corollary of the existence of the hrst regime in the bound (16) is that the constrained 
estimator ^ possesses a very fast rate of convergence in a large but slowly shrinking 
neighborhood of constant functions, independent of the (unrestricted) sieve measure of 
ill-posedness r„: 

Corollary 2 (Fast convergence rate of the constrained estimator under local-to-con- 
stant asymptotics). Consider the triangular array asymptotics where the data generating 
process, including the function g, is allowed to vary with n. Let Assumptions 1-8 be sat¬ 
isfied with the same constants for all n. In addition, assume that < C^K for some 

^Ideally, it would be of great interest to have a tight bound on the restricted sieve measure of ill- 
posedness Tnfia) for all a > 0, so that it would be possible to optimize (15) over 5. Results of this form, 
however, are not yet available in the literature, and so the optimization is not possible. 
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0 < (7^ < cx) and K\ogn/n —0. If Dg{x) = 0((i^logn/n)^/^), then 

W - 9h,t = 0,{{K\ogn/nf/^ + R-^). (18) 

In particular, if ^ Dg{x) = O -y/log n) and K = Kn = for 

some 0 < Ck < oo, then 

\\T - glkt = Op(n"*/(^+2"Viog^)- 

Remark 6 (On the condition < C^K). The condition fff < C^K, for 0 < Og < cxd, 
is satished if the sequences {pk{,x), A: > 1} and {qk{w), k > 1} consist of commonly used 
bases such as Fourier, spline, wavelet, or local polynomial partition series; see Belloni, 
Chernozhukov, Chetverikov, and Kato (2014) for details. □ 

The local-to-constant asymptotics considered in this corollary captures the hnite sam¬ 
ple situation in which the regression function is not too steep relative to the sample 
size. The convergence rate in this corollary is the standard polynomial rate of non- 
parametric conditional mean regression estimators up to a (logn)^/^ factor, regardless of 
whether the original NPIV problem without our monotonicity assumptions is mildly or 
severely ill-posed. One way to interpret this result is that the constrained estimator 
is able to recover regression functions in the shrinking neighborhood of constant func¬ 
tions at a fast polynomial rate. Notice that the neighborhood of functions g that satisfy 
sup^-gjo,!] Dg{x) = 0{{K log n/nY^"^) is shrinking at a slow rate because iF —)■ oo, in par¬ 
ticular the rate is much slower than Therefore, in hnite samples, we expect the 

estimator to perform well for a wide range of (non-constant) regression functions g as long 
as the maximum slope of g is not too large relative to the sample size. 

Remark 7 (The convergence rate of ^ is not slower than that of ^). If we replace the 
condition ,^^logn/n < c in Theorem 2 by a more restrictive condition r^,^^logn/?7, < c, 
then in addition to the bounds (15) and (16), it is possible to show that with probability 
at least 1 — a — n~^, we have 

nr - gh < C'(r„(AV(mi))‘'" + K-‘). 

This implies that the constrained estimator If satishes ||?^ — f/lh = Op{Tn{K/+ 
K~^), which is the standard minimax optimal rate of convergence established for the 
unconstrained estimator If in Blundell, Chen, and Kristensen (2007). □ 

In conclusion, in general, the convergence rate of the constrained estimator is the same 
as the standard minimax optimal rate, which depends on the degree of ill-posedness and 
may, in the worst-case, be logarithmic. This case occurs in the interior of the monotonicity 
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constraint when g is strictly monotone. On the other hand, under the monotone IV 
assumption, the constrained estimator converges at a very fast rate, independently of 
the degree of ill-posedness, in a large but slowly shrinking neighborhood of constant 
functions, a part of the boundary of the monotonicity constraint. In hnite samples, we 
expect to experience cases between the two extremes, and the bounds (15) and (16) 
provide information on what the performace of the constrained estimator depends in that 
general case. Since the hrst regime of bound (16) is active in a large set of data generating 
processes and sample size combinations, and since the fast convergence rate in Corollary 2 
is obtained in a large but slowly shrinking neighborhood of constant functions, we expect 
the boundary effect due to the monotonicity constraint to be strong even far away from 
the boundary and for relatively large sample sizes. 

Remark 8 (Average Partial Effects). We expect similar results to Theorem 2 and Corol¬ 
lary 2 to hold in the estimation of linear functionals of g, such as average marginal effects. 
In the unconstrained problem, estimators of linear functionals do not necessarily converge 
at polynomial rates and may exhibit similarly slow, logarithmic rates as for estimation of 
the function g itself (e.g. Breunig and Johannes (2015)). Therefore, imposing monotonic¬ 
ity as we do in this paper may also improve statistical properties of estimators of such 
functionals. While we view this as a very important extension of our work, we develop 
this direction in a separate paper. □ 

Remark 9 (On the role of the monotonicity constraint). Imposing the monotonicity con¬ 
straint in the NPIV estimation procedure reduces variance by removing non-monotone 
oscillations in the estimator that are due to sampling noise. Such oscillations are a com¬ 
mon feature of unconstrained estimators in ill-posed inverse problems and lead to large 
variance of such estimators. The reason for this phemonon can be seen in the conver¬ 
gence rate of unconstrained estimators,^ Tn[Kjnf-l‘^ -|- in which the variance term 
is blown up by the multiplication by the measure of ill-posedness r^. Because 
of this relatively large variance of NPIV estimators we expect the unconstrained estima¬ 
tor to possess non-monotonicities even in large samples and even if g is far away from 
constant functions. Therefore, imposing monotonicity of g can have signihcant impact on 
the estimator’s performance even in those cases. □ 

Remark 10 (On robustness of the constrained estimator, I). Implementation of the 
estimators ^ and ^ requires selecting the number of series terms K = Kn and J = Jn- 
This is a difficult problem because the measure of ill-posedness = T{Kn), appearing in 
the convergence rate of both estimators, depends on A" = Kn and can blow up quickly as 
we increase K. Therefore, setting K higher than the optimal value may result in a severe 

^see, for example, Blundell, Chen, and Kristensen (2007) 
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deterioration of the statistical properties of The problem is alleviated, however, in 
the case of the constrained estimator ^ because ^ satisfies the bound (16) of Theorem 2, 
which is independent of for sufficiently large K. In this sense, the constrained estimator 
^ possesses some robustness against setting K too high. □ 

Remark 11 (On robustness of the constrained estimator, II). Notice that the fast con¬ 
vergence rates in the local-to-constant asymptotics derived in this section are obtained 
under two monotonicity conditions. Assumptions 1 and 3, but the estimator imposes only 
the monotonicity of the regression function, not that of the instrument. Therefore, our 
proposed constrained estimator consistently estimates the regression function g even when 
the monotone IV assumption is violated. □ 

Remark 12 (On alternative estimation procedures). In the local-to-constant asymptotic 
framework where sup 3 ,g[o,i]- 0 ( 7 ( 0 ;) = 0((A'logn/?7,)^/^), the rate of convergence in (18) 
can also be obtained by simply htting a constant. However, such an estimator, unlike 
our constrained estimator, is not consistent when the regression function g does not drift 
towards a constant. Alternatively, one can consider a sequential approach to estimating 
g, namely one can hrst test whether the function g is constant, and then either £t the 
constant or apply the unconstrained estimator ^ depending on the result of the test. 
However, it seems difficult to tune such a test to match the performance of the constrained 
estimator ^ studied in this paper. □ 

Remark 13 (Estimating partially flat functions). Since the inversion of the operator T 
is a global inversion in the sense that the resulting estimators '^{x) and ^(x) depend 
not only on the shape of g{x) locally at x, but on the shape of g over the whole domain, 
we do not expect convergence rate improvements from imposing monotonicity when the 
function g is partially flat. However, we leave the question about potential improvements 
from imposing monotonicity in this case for future research. □ 

Remark 14 (Computational aspects). The implementation of the constrained estima¬ 
tor in (13) is particularly simple when the basis vector p(x) consists of polynomials or 
B-splines of order 2. In that case, Dp{x) is linear in x and, therefore, the constraint 
Dp{x)'b > 0 for all x G [0,1] needs to be imposed only at the knots or endpoints of [0,1], 
respectively. The estimator thus minimizes a quadratic objective function subject to 
a (hnite-dimensional) linear inequality constraint. When the order of the polynomials or 
B-splines in p(x) is larger than 2, imposing the monotonicity constraint is slightly more 
complicated, but it can still be transformed into a hnite-dimensional constraint using a 
representation of non-negative polynomials as a sum of squared polynomials:® one can 

®We thank A. Belloni for pointing out this possibility. 
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represent any non-negative polynomial / : M —)■ M as a snm of squares of polynomials (see 
the survey by Reznick (2000), for example), i.e. f{x) = p{x)'Mp{x) where p{x) is the vec¬ 
tor of monomials up to some order and M a matrix of coefficients. Letting f{x) = Dp{xyb, 
our monotonicity constraint f{x) >0 can then be written as p{xyMp{x) > 0 for some 
matrix M that depends on b. This condition is equivalent to requiring the matrix M to 
be positive semi-dehnite. thus minimizes a quadratic objective function subject to a 
(hnite-dimensional) semi-dehniteness constraint. 

For polynomials dehned not over whole M but only over a compact sub-interval of M, 
one can use the same reasoning as above together with a result attributed to M. Fekete 
(see Powers and Reznick (2000), for example): for any polynomial f{x) with f{x) > 0 for 
X G [—1,1], there are polynomials /i(x) and f 2 {x), non-negative over whole M, such that 
f{x) = fi{x) -f (1 — x^)f 2 {x). Letting again f{x) = Dp{xyb, one can therefore impose 
our monotonicity constraint by imposing the positive semi-dehniteness of the coefficients 
in the sums-of-squares representation of /i(x) and f 2 {x). □ 

Remark 15 (Penalization and shape constraints). Recall that the estimators ^ and ^ 
require setting the constraint ||6|| < Cb in the optimization problems (12) and (13). In 
practice, this constraint, or similar constraints in terms of Sobolev norms, which also 
impose bounds on derivatives of g, are typically not enforced in the implementation of an 
NPIV estimator. Horowitz (2012) and Horowitz and Lee (2012), for example, observe that 
imposing the constraint does not seem to have an effect in their simulations. On the other 
hand, especially when one includes many series terms in the computation of the estimator, 
Blundell, Chen, and Kristensen (2007) and Gagliardini and Scaillet (2012), for example, 
argue that penalizing the norm of g and of its derivatives may stabilize the estimator by 
reducing its variance. In this sense, penalizing the norm of g and of its derivatives may 
have a similar effect as imposing monotonicity. However, there are at least two important 
differences between penalization and imposing monotonicity. First, penalization increases 
bias of the estimators. In fact, especially in severely ill-posed problems, even small amount 
of penalization may lead to large bias. In contrast, the monotonicity constraint on the 
estimator does not increase bias much when the function g itself satishes the monotonicity 
constraint. Second, penalization requires the choice of a tuning parameter that governs 
the strength of penalization, which is a difficult statistical problem. In contrast, imposing 
monotonicity does not require such choices and can often be motivated directly from 
economic theory. □ 
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4 Identification Bounds under Monotonicity 


In the previous section, we derived non-asymptotic error bounds on the constrained es¬ 
timator in the NPIV model (1) assuming that g is point-identified, or equivalently, that 
the linear operator T is invertible. Newey and Powell (2003) linked point-identification 
of g to completeness of the conditional distribution of X given W, but this completeness 
condition has been argued to be strong (Santos (2012)) and non-testable (Canay, San¬ 
tos, and Shaikh (2013)). In this section, we therefore discard the completeness condition 
and explore the identification power of our monotonicity conditions, which appear nat¬ 
ural in many economic applications. Specifically, we derive informative bounds on the 
identified set of functions g satisfying (1). This means that, under our two monotonicity 
assumptions, the identified set is a proper subset of all monotone functions g & M.. 

By a slight abuse of notation, we define the sign of the slope of a differentiable, 
monotone function / G by 

{ 1, Df{x) > 0 Vx G [0,1] and Df{x) > 0 for some x G [0,1] 

0, Df{x) = 0 Vx G [0,1] 

— 1, Df{x) < 0 Vx G [0,1] and Df{x) < 0 for some x G [0,1] 

and the sign of a scalar b by sign{b) := 1{6 > 0} — 1{6 < 0}. We first show that if 
the function g is monotone, the sign of its slope is identified under our monotone IV 
assumption (and some other technical conditions): 

Theorem 3 (Identification of the sign of the slope). Suppose Assumptions 1 and 2 hold 
and fx,w{.x,w) > 0 for all {x,w) G (0,1)^. If g is monotone and continuously differen¬ 
tiable, then sign{Dg) is identified. 

This theorem shows that, under certain regularity conditions, the monotone IV as¬ 
sumption and monotonicity of the regression function g imply identification of the sign 
of the regression function’s slope, even though the regression function itself is, in general, 
not point-identified. This result is useful because in many empirical applications it is 
natural to assume a monotone relationship between outcome variable Y and the endoge¬ 
nous regressor X, given by the function g, but the main question of interest concerns not 
the exact shape of g itself, but whether the effect of X on V, given by the slope of gf, is 
positive, zero, or negative; see, for example, the discussion in Abrevaya, Hausman, and 
Khan (2010)). 

Remark 16 (A test for the sign of the slope of g). In fact. Theorem 3 yields a surprisingly 
simple way to test the sign of the slope of the function g. Indeed, the proof of Theorem 
3 reveals that g is increasing, constant, or decreasing if the function w h-)■ E[V|iy = w] 
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is increasing, constant, or decreasing, respectively. By Chebyshev’s association ineqnality 
(Lemma 5 in the appendix), the latter assertions are eqnivalent to the coefficient (3 in the 
linear regression model 

Y = a + PW + U, E[UW] = 0 (19) 

being positive, zero, or negative since sign{/3) = sign{cov(W,Y)) and 

cov{W, Y) = E[WY] - E[W]E[Y] 

= E[WE[Y\W]] - E[hh]E[E[F|Ph]] = cov{W,E[Y\W]) 

by the law of iterated expectations. Therefore, under our conditions, hypotheses about the 
sign of the slope of the function g can be tested by testing the corresponding hypotheses 
about the sign of the slope coefficient (3 in the linear regression model (19). In particular, 
under our two monotonicity assumptions, one can test the hypothesis of “no effect” of X 
on Y, i.e. that is a constant, by testing whether (3 = 0 or not using the usual t-statistic. 
The asymptotic theory for this statistic is exactly the same as in the standard regression 
case with exogenous regressors, yielding the standard normal limiting distribution and, 
therefore, completely avoiding the ill-posed inverse problem of recovering g. □ 

It turns out that our two monotonicity assumptions possess identifying power even 
beyond the slope of the regression function. 

Definition 1 (Identihed set). We say that two functions g',g" G T^[0,1] are observation- 
ally equivalent ifE[g\X) — g"{X)\W] = 0. The identified set 0 is defined as the set of all 
functions g' E Ai that are observationally equivalent to the true function g satisfying (1). 

The following theorem provides necessary conditions for observational equivalence. 

Theorem 4 (Identihcation bounds). Let Assumptions 1 and 2 be satisfied, and let g',g" G 
L^[0,1]. Further, let C := Ci/cp where Ci := {x 2 — /min{Ti — Xi,X 2 — X 2 } and 

Cp := min{l — ^ 2 , Wi} min{Ci? — l,2}c^c//4. If there exists a function h G T^[0,1] such 
that g' — g" + h G M. and \\h\\ 2 ,t + (5||E||2||h||2 < \\g' — g”\\ 2 ,t, then g' and g" are not 
observationally equivalent. 

Under Assumption 3 that g is increasing. Theorem 4 suggests the construction of a 
set 0' that includes the identihed set 0 by 0' := A1+\A, where A1+ := TL{0) denotes all 
increasing functions in Ai and 

A := G Ai+ : there exists h G T^[0,1] such that 

g'-g + heM and \\h\\ 2 ,t + C\\T\\ 2 \\h \\2 < \\g'- g\\ 2 ,ty (20) 
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We emphasize that A is not empty, which means that our Assumptions 1-3 possess 
identifying power leading to nontrivial bounds on g. Notice that the constant C depends 
only on the observable quantities c^, cj, and Cp from Assumptions 1-2, and on the 
known constants Xi, X 2 , Xi, X 2 , tci, and W2- Therefore, the set 0' could, in principle, be 
estimated, but we leave estimation and inference on this set to future research. 

Remark 17 (Further insight on identihcation bounds). It is possible to provide more 
insight into which functions are in A and thus not in 0'. First, under the additional 
minor condition that fx,w{x,w) > 0 for all (x,tc) G (0,1)^, all functions in 0' have to 
intersect g] otherwise they are not observationally equivalent to g. Second, for a given 
g' G M.p and h G L^[0, 1] such that g' — g + h is monotone, the inequality in condition 
(20) is satished if ||h ||2 is not too large relative to \\g' — g\\2,t- In the extreme case, setting 
h = 0 shows that 0' does not contain elements g' that disagree with g on [xi, X 2 ] and such 
that g' — g is monotone. More generally, 0' does not contain elements g' whose difference 
with g is too close to a monotone function. Therefore, for example, functions g' that are 
much steeper than g are excluded from 0'. □ 

5 Testing the Monotonicity Assumptions 

In this section, we propose tests of our two monotonicity assumptions based on an i.i.d. 
sample (A*, IF*), i = l,...,n, from the distribution of {X,W). First, we discuss an 
adaptive procedure for testing the stochastic dominance condition (4) in our monotone 
IV Assumption 1. The null and alternative hypotheses are 

Ho : Fx\w{x\w') > Fx\w{x\w'') for all x,w',w" G (0,1) with w' < w" 

Ha '■ Fxiwi^W) < Fx\w{xW') for some x,w',w'' G (0,1) with w' < w", 

respectively. The null hypothesis, Hq, is equivalent to stochastic monotonicity of the 
conditional distribution function Fx\w{x\w). Although there exist several good tests of 
Ho in the literature (see Lee, Linton, and Whang (2009), Delgado and Escanciano (2012) 
and Lee, Song, and Whang (2014), for example), to the best of our knowledge there 
does not exist any procedure that adapts to the unknown smoothness level of Fx\w{x\u!)- 
We provide a test that is adaptive in this sense, a feature that is not only theoretically 
attractive, but also important in practice: it delivers a data-driven choice of the smoothing 
parameter hn (bandwidth value) of the test whereas nonadaptive tests are usually based 
on the assumption that —)■ 0 with some rate in a range of prespecihed rates, leaving the 
problem of the selection of an appropriate value of hn in a given data set to the researcher 
(see, for example, Lee, Linton, and Whang (2009) and Lee, Song, and Whang (2014)). 
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We develop the critical value for the test that takes into account the data dependence 
induced by the data-driven choice of the smoothing parameter. Our construction leads 
to a test that controls size, and is asymptotically non-conservative. 

Our test is based on the ideas in Chetverikov (2012) who in turn builds on the methods 
for adaptive specihcation testing in Horowitz and Spokoiny (2001) and on the theoretical 
results on high dimensional distributional approximations in Chernozhukov, Chetverikov, 
and Kato (2013c) (CCK). Note that Fx\w{'^\w) = E[1{X < a;}|W = w], so that for a 
hxed X G (0,1), the hypothesis that Fx\w{.x\'^') — Fx\w{.x,w") for all 0 < tc' < tc" < 1 
is equivalent to the hypothesis that the regression function w h-)■ E[1{W < x}|iy = w] is 
decreasing. An adaptive test of this hypothesis was developed in Chetverikov (2012). In 
our case, Hq requires the regression function w h-)■ E[1{W < x}\W = w] to be decreasing 
not only for a particular value x G (0,1) but for all x G (0,1), and so we need to extend 
the results obtained in Chetverikov (2012). 

Let iL : M —)■ M be a kernel function satisfying the following conditions: 

Assumption 9 (Kernel). The kernel function A : M —)■ M is such that (i) K{w) > 0 
for all w G (—1,1), (ii) K{w) = 0 for all w ^ (—1,1), (Hi) K is continuous, and (iv) 
J^oo = 1 . 

We assume that the kernel function K{w) has bounded support, is continuous, and is 
strictly positive on the support. The last condition excludes higher-order kernels. For a 
bandwidth value h > 0, dehne 

Kh{w) := h-^K(w/h), w G M. 

Suppose Hq is satished. Then, by the law of iterated expectations, 

E [(1{W < x} - 1{X, < a;})sign(W - W,)Kh{W, - w)Kh{W, - w)] < 0 (21) 

for all x,w E (0,1) and i, j = 1,... ,n. Denoting 


:= sign(W - kfy)A,(W - w)K,,{W, - w), 

taking the sum of the left-hand side in (21) over i, j = 1,... ,n, and rearranging give 


E 


1{W < x} - K^^,h{w)) 

i=i 


i=l 


< 0 , 


or, equivalently. 


E 


'^ki^h{w)l{Xi < x} 


2 = 1 


< 0 , 


( 22 ) 
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where 

n 

^ (ly)). 

i=i 

To define the test statistic T, let Bn be a collection of bandwidth valnes satisfying the 
following conditions: 

Assumption 10 (Bandwidth valnes). The collection of bandwidth values is Bn '■= {h G 
M : h = u’‘/2,1 = 0,1,2,..., h > hmin} for some u G (0,1) where hmin := hmin.n is such 
that l/(?7,hmin) < Chn~^^ for some constants Ch,Ch > 0. 


The collection of bandwidth values Bn is a geometric progression with the coefficient 
u G (0,1), the largest value 1/2, and the smallest value converging to zero not too fast. 
As the sample size n increases, the collection of bandwidth values Bn expands. 

Let Wn '■= {Wi ,..., Wn}, and Xn := {e + 1{1 — 2e)/n : I = 0,1,... ,n} for some small 
e > 0. We define our test statistic by 


T : = 


max 

{x,'w,h)&X„xWnXB„ 


hh{w)l{Xi < x} 


(23) 


The statistic T is most closely related to that in Lee, Linton, and Whang (2009). The 
main difference is that we take the maximum with respect to the set of bandwidth values 
h E Bn to achieve adaptiveness of the test. 

We now discuss the construction of a critical value for the test. Suppose that we 
would like to have a test of level (approximately) a. As succinctly demonstrated by 
Lee, Linton, and Whang (2009), the derivation of the asymptotic distribution of T is 
complicated even when Bn is a singleton. Moreover, when Bn is not a singleton, it is 
generally unknown whether T converges to some nondegenerate asymptotic distribution 
after an appropriate normalization. We avoid these complications by employing the non- 
asymptotic approach developed in CCK and using a multiplier bootstrap critical value 
for the test. Let ei,... ,e„ be an i.i.d. sequence of A^(0,1) random variables that are 
independent of the data. Also, let Fxiwixlw) be an estimator of Fx\w{x\w) satisfying 
the following conditions: 


Assumption 11 (Estimator of Fx|w(2^|'U^))- The estimator Fx\w{x\w) of Fx\w{x\uj) is 
such that (i) 


P ( P ( max \Fx\w{x\'^) ~ Fx\w{x\'^)\ > TUpn > Cpn < Cpn 

(x,w)ex„xWn 


for some constants cf,Cf > 0, and (ii) \Fx\w{.x\'^)\ — ^f for all {x,w) E Xn x Wn 



This is a mild assumption implying uniform consistency of an estimator Fx\w{x\w) of 
Fx\w{.x\'^) ov^r {x,w) G X Wn- Define a bootstrap test statistic by 

YTi=i ei {ki,h{w){l{Xi < x} - Fx\w{x\^i))) 

:= max --—-- F 

{x,w,h)£X„xWnXBn {J27-1 ^hhiwy) 

Then we define the critical value® c{a) for the test as 

c{a) := (1 — a) conditional quantile of given the data. 

We reject Hq if and only if T > c(a). To prove validity of this test, we assume that 
the conditional distribution function Fx\w{x\w) satisfies the following condition: 

Assumption 12 (Conditional Distribution Function Fx\w{x\w)). The conditional dis¬ 
tribution function Fx\w{.A'^) such that < Fx\w{A'^) — Fx\wi)- ~ F for all 
w G (0,1) and some constants 0 < < I- 

The first theorem in this section shows that our test controls size asymptotically and 
is not conservative: 

Theorem 5 (Polynomial Size Control). Let Assumptions 2, 9, 10, 11, and 12 be satisfied. 
If F[q holds, then 

P (T > c{a)) <a + Cn-T (24) 

If the functions w h->■ Fx\w{x\w) are constant for all x G (0,1), then 

|P (T > c{a)) - a| < Cn-T (25) 

In both ( 24 ) and (25), the constants c and C depend only on cw, Cw, c^, cp, Cp, q, (7^, 
and the kernel K. 

Remark 18 (Weak Condition on the Bandwidth Values). Our theorem requires 

i < C.n- ( 26 ) 

for all h G Bn, which is considerably weaker than the analogous condition in Lee, Linton, 
and Whang (2009) who require l/{nh^) —?• 0, up-to logs. This is achieved by using a 
conditional test and by applying the results of CCK. As follows from the proof of the 
theorem, the multiplier bootstrap distribution approximates the conditional distribution 

®In the terminology of the moment inequalities literature, c{a) can be considered a “one-step” or 
“plug-in” critical value. Following Chetverikov (2012), we could also consider two-step or even multi-step 
(stepdown) critical values. For brevity of the paper, however, we do not consider these options here. 
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of the test statistic given W„ = {Wi ,..., Wn\- Conditional on Wn, the denominator in 
the dehnition of T is hxed, and does not require any approximation. Instead, we could 
try to approximate the denominator of T by its probability limit. This is done in Ghosal, 
Sen, and Vaart (2000) using the theory of Hoeffding projections but they require the 
condition l/nK^ —)■ 0. Our weak condition (26) also crucially relies on the fact that we 
use the results of CCK. Indeed, it has already been demonstrated (see Chernozhukov, 
Chetverikov, and Kato (2013a,b), and Belloni, Chernozhukov, Chetverikov, and Kato 
(2014)) that, in typical nonparametric problems, the techniques of CCK often lead to 
weak conditions on the bandwidth value or the number of series terms. Our theorem is 
another instance of this fact. □ 

Remark 19 (Polynomial Size Control). Note that, by (24) and (25), the probability of 
rejecting Hq when Hq is satished can exceed the nominal level a only by a term that 
is polynomially small in n. We refer to this phenomenon as a polynomial size control. 
As explained in Lee, Linton, and Whang (2009), when Bn is a singleton, convergence of 
T to the limit distribution is logarithmically slow. Therefore, Lee, Linton, and Whang 
(2009) used higher-order corrections derived in Piterbarg (1996) to obtain polynomial size 
control. Here we show that the multiplier bootstrap also gives higher-order corrections 
and leads to polynomial size control. This feature of our theorem is also inherited from 
the results of CCK. □ 

Remark 20 (Uniformity). The constants c and C in (24) and (25) depend on the data 
generating process only via constants (and the kernel) appearing in Assumptions 2, 9, 10, 
11, and 12. Therefore, inequalities (24) and (25) hold uniformly over all data generating 
processes satisfying these assumptions with the same constants. We obtain uniformity 
directly from employing the distributional approximation theorems of CCK because they 
are non-asymptotic and do not rely on convergence arguments. □ 

Our second result in this section concerns the ability of our test to detect models in 
the alternative Ha- Let e > 0 be the constant appearing in the dehnition of T via the set 

Theorem 6 (Consistency). Let Assumptions 2, 9, 10, 11, and 12 he satisfied and assume 
that Fx\w{x\w) is continuously differentiable. If Ha holds with Dn,Fx\w{x\w) > 0 for 
some a; G (e, 1 — e) and w G (0,1), then 

P (T > c{a)) —>-1 as n ^ oo. (27) 

This theorem shows that our test is consistent against any model in Ha (with smooth 
Fx\w{x\w)) whose deviation from Hq is not on the boundary, so that the deviation 
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DwFx\w{.x\'^) > 0 occurs for x G (e, 1 — e). It is also possible to extend our results 
to show that Theorems 5 and 6 hold with e = 0 at the expense of additional technicali¬ 
ties. Further, using the same arguments as those in Chetverikov (2012), it is possible to 
show that the test suggested here has minimax optimal rate of consistency against the 
alternatives belonging to certain Holder classes for a reasonably large range of smoothness 
levels. We do not derive these results here for the sake of brevity of presentation. 

We conclude this section by proposing a simple test of our second monotonicity as¬ 
sumption, that is, monotonicity of the regression function g. The null and alternative 
hypotheses are 

Hq : g{x') < g{x”) for all x',x'' G (0,1) with x' < x” 

Ha : g{x') > g{x") for some x',x" G (0,1) with x' < x", 

respectively. The discussion in Remark 16 reveals that, under Assumptions 1 and 2, 
monotonicity of g{x) implies monotonicity of w h-)■ E[y|lF = tc]. Therefore, under As¬ 
sumptions 1 and 2, we can test Hq by testing monotonicity of the conditional expectation 
w H-)■ £[1^111^ = tc] using existing tests such as Chetverikov (2012) and Lee, Song, and 
Whang (2014), among others. This procedure tests an implication of Hq instead of Hq 
itself and therefore may have low power against some alternatives. On the other hand, it 
does not require solving the model for g{x) and therefore avoids the ill-posedness of the 
problem. 

6 Simulations 

In this section, we study the hnite-sample behavior of our constrained estimator that im¬ 
poses monotonicity and compare its performance to that of the unconstrained estimator. 
We consider the NPIV model Y = g{X) + e, E[e:|lF] = 0, for two different regression 
functions, one that is strictly increasing and a weakly increasing one that is constant over 
part of its domain: 

Model 1: g{x) = /s;sin(7rx — vr/2) 

Model 2: g{x) = 10k [-(a: - Q.2bfl{x G [0, 0.25]} + (a: - 0.75)H{x G [0.75,1]}] 

where e = na^s and e = ge + ^1 — The regressor and instrument are generated 
hy X = ‘h(.^) and W = ^(C), respectively, where <I> is the standard normal cdf and 
■C = pC + ''/i' — p^e. The errors are generated by (k, C, e) iV(0,J). 

We vary the parameter k in {1,0.5, 0.1} to study how the constrained and uncon¬ 
strained estimators’ performance compares depending on the maximum slope of the re¬ 
gression function, g governs the dependence of X on the regression error £ and p the 
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strength of the hrst stage. All results are based on 1, 000 MC samples and the normalized 
B-spline basis for p(x) and q{w) of degree 3 and 4, respectively. 

Tables 1-4 report the Monte Carlo approximations to the squared bias, variance, and 
mean squared error (“MSE”) of the two estimators, each averaged over a grid on the 
interval [0,1]. We also show the ratio of the constrained estimator’s MSE divided by the 
unconstrained estimator’s MSE. kx and kw denote, respectively, the number of knots used 
for the basis p{x) and q{w). The hrst two tables vary the number of knots, and the latter 
two the dependence parameters p and rj. Different sample sizes and different values for 
p, 7], and (Tg yield qualitatively similar results. Figures 2 and 3 show the two estimators 
for a particular combination of the simulation parameters. The dashed lines represent 
conhdence bands, computed as two times the (pointwise) empirical standard deviation of 
the estimators across simulation samples. Both, the constrained and the unconstrained, 
estimators are computed by ignoring the bound ||6|| < Cb in their respective dehnitions. 
Horowitz and Lee (2012) and Horowitz (2012) also ignore the constraint ||6|| < Cb and 
state that it does not affect the qualitative results of their simulation experiment. 

The MSE of the constrained estimator (and, interestingly, also of the unconstrained 
estimator) decreases as the regression function becomes hatter. This observation is con¬ 
sistent with the error bound in Theorem 2 depending positively on the maximum slope 
of g. 

Because of the joint normality of (X, hh), the simulation design is severely ill-posed 
and we expect high variability of both estimators. In all simulation scenarios, we do in 
fact observe a very large variance relative to bias. However, the magnitude of the variance 
dihers signihcantly across the two estimators: in all scenarios, even in the design with a 
strictly increasing regression function, imposing the monotonicity constraint signihcantly 
reduces the variance of the NPIV estimator. The MSE of the constrained estimator is 
therefore much smaller than that of the unconstrained estimator, from about a factor of 
two smaller when g is strictly increasing and the noise level is low {as = 0.1), to around 20 
times smaller when g contains a hat part and the noise level is high {as = 0.7). Generally, 
the gains in MSE from imposing monotonicity are larger the higher the noise level as in 
the regression equation and the higher the hrst-stage correlation p}^ 

7 Gasoline Demand in the United States 

In this section, we revisit the problem of estimating demand functions for gasoline in the 
United States. Because of the dramatic changes in the oil price over the last few decades, 

^°Since Tables 1 and 2 report results for the lower level of p, and Tables 3 and 4 results for the lower 
noise level we consider the selection of results as, if at all, favoring the unconstrained estimator. 
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understanding the elasticity of gasoline demand is fundamental to evaluating tax policies. 
Consider the following partially linear specihcation of the demand function: 

y = g{X, Zi) + 7 'Z 2 + £, E[£|iy, Zi, Z 2 ] = 0, 

where Y denotes annual log-gasoline consumption of a household, X log-price of gasoline 
(average local price), Zi log-household income, Z 2 are control variables (such as population 
density, urbanization, and demographics), and W distance to major oil platform. We 
allow for price X to be endogenous, but assume that (Zi,Z 2 ) is exogenous. W serves 
as an instrument for price by capturing transport cost and, therefore, shifting the cost 
of gasoline production. We use the same sample of size 4, 812 from the 2001 National 
Household Travel Survey and the same control variables Z 2 as Blundell, Horowitz, and 
Parey (2012). More details can be found in their paper. 

Moving away from constant price and income elasticities is likely very important as 
individuals’ responses to price changes vary greatly with price and income level. Since 
economic theory does not provide guidance on the functional form of g, Ending an appro¬ 
priate parametrization is difficult. Hausman and Newey (1995) and Blundell, Horowitz, 
and Parey (2012), for example, demonstrate the importance of employing flexible estima¬ 
tors of g that do not suffer from misspecification bias due to arbitrary restrictions in the 
model. Blundell, Horowitz, and Parey (2013) argue that prices at the local market level 
vary for several reasons and that they may reflect preferences of the consumers in the local 
market. Therefore, one would expect prices X to depend on unobserved factors in e: that 
determine consumption, rendering price an endogenous variable. Furthermore, the theory 
of the consumer requires downward-sloping compensated demand curves. Assuming a pos¬ 
itive income derivative^^ dg/dzi, the Slutsky condition implies that the uncompensated 
(Marshallian) demand curves are also downward-sloping, i.e. g{-,zi) should be mono¬ 
tone for any zi, as long as income effects do not completely offset price effects. Finally, 
we expect the cost shifter W to monotonically increase cost of producing gasoline and 
thus satisfy our monotone IV condition. In conclusion, our constrained NPIV estimator 
appears to be an attractive estimator of demand functions in this setting. 

We consider three benchmark estimators. First, we compute the unconstrained non- 
parametric (“uncon. NP”) series estimator of the regression of V on X and Zi, treat¬ 
ing price as exogenous. As in Blundell, Horowitz, and Parey (2012), we accommodate 
the high-dimensional vector of additional, exogenous covariates Z 2 by (i) estimating 7 
by Robinson (1988)’s procedure, (ii) then removing these covariates from the outcome, 
and (iii) estimating g by regressing the adjusted outcomes on X and Zi. The second 

^^Blundell, Horowitz, and Parey (2012) estimate this income derivative and do, in fact, find it to be 
positive over the price range of interest. 
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benchmark estimator (“con. NP”) repeats the same steps (i)-(iii) except that it imposes 
monotonicity (in price) of g in steps (i) and (iii). The third benchmark estimator is the 
unconstrained NPIV estimator (“uncon. NPIV”) that accounts for the covariates Z 2 in 
similar fashion as the hrst, unconstrained nonparametric estimator, except that (i) and 
(iii) employ NPIV estimators that impose additive separability and linearity in Z 2 . 

The fourth estimator we consider is the constrained NPIV estimator (“con. NPIV”) 
that we compare to the three benchmark estimators. We allow for the presence of the 
covariates Z 2 in the same fashion as the unconstrained NPIV estimator except that, in 
steps (i) and (iii), we impose monotonicity in price. 

We report results for the following choice of bases. All estimators employ a quadratic 
B-spline basis with 3 knots for price X and a cubic B-spline with 10 knots for the instru¬ 
ment W. Denote these two bases by P and Q, using the same notation as in Section 3. 
In step (i), the NPIV estimators include the additional exogenous covariates {Zi, Z 2 ) in 
the respective bases for X and W, so they use the estimator dehned in Section 3 except 
that the bases P and Q are replaced by P ;= [P, P x Zi, Z 2 ] and Q := [Q, Q x (Zi, Z 2 )], 
respectively, where Z^. := {Z^^i, ..., Z^^n)', k = 1,2, stacks the observations i = 1 ,... ,n 
and P X Zi denotes the tensor product of the columns of the two matrices. Since, in the 
basis P, we include interactions of P with Zi, but not with Z 2 , the resulting estimator 
allows for a nonlinear, nonseparable dependence of V on X and Zi, but imposes additive 
separability in Z 2 . The conditional expectation of Y given W, Zi, and Z 2 does not have 
to be additively separable in Z 2 , so that, in the basis Q, we include interactions of Q with 
both Zi and Z 2 .^^ 

We estimated the demand functions for many different combinations of the order of B- 
spline for W, the number of knots in both bases, and even with various penalization terms 
(as discussed in Remark 15). While the shape of the unconstrained NPIV estimate varied 
slightly across these different choices of tuning parameters (mostly near the boundary of 
the support of X), the constrained NPIV estimator did not exhibit any visible changes 
at all. 

Figure 4 shows a nonparametric kernel estimate of the conditional distribution of the 
price X given the instrument W. Overall the graph indicates an increasing relationship 
between the two variables as required by our stochastic dominance condition (4). We 
formally test this monotone IV assumption by applying our new test proposed in Section 5. 
We hnd a test statistic value of 0.139 and 95%-critical value of 1.720.^^ Therefore, we fail 

^^Notice that P and Q include constant terms so it is not necessary to separately include Zfc in addition 
to its interactions with P and Q, respectively. 

^^The critical value is computed from 1,000 bootstrap samples, using the bandwidth set Bn = 
{2,1,0.5,0.25,0.125,0.0625}, and a kernel estimator for Fx\w with bandwidth 0.3 which produces the 
estimate in Figure 4. 
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to reject the monotone IV assumption. 

Figure 5 shows the estimates of the demand function at three income levels, at the lower 
quartile ($42,500), the median ($57,500), and the upper quartile ($72,500). The area 
shaded in grey represents the 90% uniform conhdence bands around the unconstrained 
NPIV estimator as proposed in Horowitz and Lee (2012).^^ The black lines correspond 
to the estimators assuming exogeneity of price and the red lines to the NPIV estimators 
that allow for endogeneity of price. The dashed black line shows the kernel estimate of 
Blundell, Horowitz, and Parey (2012) and the solid black line the corresponding series 
estimator that imposes monotonicity. The dashed and solid red lines similarly depict the 
unconstrained and constrained NPIV estimators, respectively. 

All estimates show an overall decreasing pattern of the demand curves, but the two 
unconstrained estimators are both increasing over some parts of the price domain. We 
view these implausible increasing parts as hnite-sample phenomena that arise because the 
unconstrained nonparametric estimators are too imprecise. The wide conhdence bands of 
the unconstrained NPIV estimator are consistent with this view. Hausman and Newey 
(1995) and Horowitz and Lee (2012) hnd similar anomalies in their nonparametric esti¬ 
mates, assuming exogenous prices. Unlike the unconstrained estimates, our constrained 
NPIV estimates are downward-sloping everywhere and smoother. They lie within the 
90% uniform conhdence bands of the unconstrained estimator so that the monotonicity 
constraint appears compatible with the data. 

The two constrained estimates are very similar, indicating that endogeneity of prices 
may not be important in this problem, but they are both signihcantly hatter than the 
unconstrained estimates across all three income groups, which implies that households 
appear to be less sensitive to price changes than the unconstrained estimates suggest. 
The small maximum slope of the constrained NPIV estimator also suggests that the error 
bound in Theorem 2 may be small and therefore we expect the constrained NPIV estimate 
to be precise for this data set. 


^^Critical values are computed from 1, 000 bootstrap samples and the bands are computed on a grid of 


100 equally-spaced points in the support of the data for X. 
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A Proofs for Section 2 


For anyG L^[0,1], let ||/z.||i := \h[x)\dx, ||^||i,t := \h{x)\dx and define the operator 

normby ||T ||2 := snp^gi 2 [o,i]; ||,,|| 2 >o ||Th|| 2 /||h|| 2 . Note that ||T ||2 < f]c,wix,w)dxdw, 
and so nnder Assnmption 2 , ||T ||2 < Ct- 

Proof of Theorem 1. We first show that for any h G Ad, 


| 2 ,i < C'lllhlli^i (28) 

for Cl := {x 2 — XiY/‘^ / niin{a:i — Xi^X 2 — X 2 }. Indeed, by monotonicity of h, 


2,t 


r^2 




1/2 




h{xYdx\ < \/a?2'^-^niax{|h(xi)|, |h(a;2)|} 


< \/x2- 


Xi 


/r \h{x)\dx 


min {xi — Xi,X 2 — X 2 } 
so that (28) follows. Therefore, for any increasing continnously differentiable h G Ad, 


||h||2,t < C'lllhlli,^ < C'iC'2||Th||i < C'iC'2||Th||2, 

where the first ineqnality follows from (28), the second from Lemma 2 below (which is 
the main step in the proof of the theorem), and the third by Jensen’s ineqnality. Hence, 
conclusion (7) of Theorem 1 holds for increasing continuously differentiable h G Ad with 
C := C 1 C 2 and C 2 as dehned in Lemma 2. 

Next, for any increasing function h G Ad, it follows from Lemma 9 that one can find 
a sequence of increasing continuously differentiable functions h*, G Ad, /c > 1, such that 
\\hk — h \\2 —J-O as k ^ oc. Therefore, by the triangle inequality, 

\\h\\2,t < Whkht + \\hk - h\\2,t < CWThkh + \\hk - h\\2,t 

< C\\Th\\2 + C\\T{hk - h)\\2 + \\hk - h\\2,t 

< C\\Th\\2 + CWTUihk - h)\\2 + \\hk - h\\2,t 
<C\\Th\\2 + {C\\T\\2 + imhk-h)\\2 
<C\\Th\\2 + {CCT + l)\\hk-h\\2 


where the third line follows from the Cauchy-Schwarz inequality, the fourth from \\hk — 
h\\ 2 ,t < \\hk — h\\ 2 , and the fifth from Assumption 2(i). Taking the limit as /c —)■ cxd of 
both the left-hand and the right-hand sides of this chain of inequalities yields conclusion 
(7) of Theorem 1 for all increasing h G Ad. 

Finally, since for any decreasing h G Ad, we have that — h G Ad is increasing, || — h|| 2,4 = 
\\h\\ 2 ,t and ||Th ||2 = ||T(—h)|| 2 , conclusion (7) of Theorem 1 also holds for all decreasing 
h & M., and thus for all h G Ad. This completes the proof of the theorem. Q.E.D. 
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Lemma 2. Let Assumptions 1 and 2 hold. Then for any increasing continuously differ¬ 
entiable h G L^[0,1], 

\\hh,t < C,\\Th\\, 

where C 2 := 1/cp and Cp := c^C//2min{l — W 2 , Wi}min{(C'i7’ — l)/2,1}. 


Proof. Take any increasing continnonsly differentiable fnnction h G L^[0,1] snch that 
\\h\\i^t = 1- Define M{w) := E[h{X)\W = w] for all w G [0,1] and note that 

\M{w)fw{w)\dw > cw [ \M{w)\dw 

Jo 

where the ineqnality follows from Assnmption 2 (iii). Therefore, the asserted claim follows 
if we can show that \M{w)\dw is bonnded away from zero by a constant that depends 
only on (. 

First, note that M{w) is increasing. This is becanse, by integration by parts. 



M{w) = / h{x)fx\w{x\w)dx = h{l) — / Dh{x)Fx\wix\w)dx, 

Jo Jo 

so that condition (4) of Assnmption 1 and Dh{x) > 0 for all x imply that the fnnction 
M{w) is increasing. 

Consider the case in which h{x) > 0 for all x G [0,1]. Then M{w) > 0 for all w G [0,1]. 
Therefore, 


\M{w)\dw> / \M{w)\dw > {1 — W 2 )M{w 2 ) = {I — W 2 ) / h{x)fx\w{x\w 2 )dx 
J W2 ^ 0 

nX2 rX2 

>{l-W2) / h{x)fx\w{A'^2)dx>{l-W2)cf / h{x)dx 

J Xl J Xl 

= (1 - W2)Cf\\h\\i^t = (1 - W2)Cf > 0 


by Assnmption 2 (ii). Similarly, 

1 

\M{w)\dw > wiCf > 0 

when h{x) < 0 for all x G [0,1]. Therefore, it remains to consider the case in which there 
exists X* G (0,1) snch that h{x) < 0 for x < x* and h{x) > 0 for x > x*. Since h(x) is 
continnons, h[x*) = 0 , and so integration by parts yields 



M{w) = / h{x)fx\w{x\w)dx-\- / h{x)fx\w{x\w)dx 


Dh{x)Fx\wix\w)dx + / Dh{x){l — Fx\wix\w))dx. 


(29) 
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For fc = 1,2, let Ak := Dh{x){l - Fx\wix\wk)) and Bk := Jq Dh{x)Fx\wix\wk)dx, 
so that M{wk) = Ak — Bk. 

Consider the following three cases separately, depending on where x* lies relative to 
Xi and X 2 . 


Case I {xi < X* < X 2 ): First, we have 

pi px* 

Ai + B2= / Dh{x){l - Fx\w{x\wi))dx + / Dh{x)Fx\w{x\w2)dx 

J X* Jo 

= / h{x)fx\w{x\wi)dx - / h{x)fx\w{x\w2)dx 

Jx* Jo 

j'X2 rx* 

> / h{x)fx\w{x\wi)dx- / h{x)fx\w{x\w2)dx 


fxi 


f'X 2 


fX 2 


> Cl h{x)dx + Cf / \h{x)\dx = Cf / \h{x)\dx 


• Xl 


f Xl 


= = C/ > 0 


(30) 


where the fonrth line follows from Assnmption 2(ii). Second, by (4) and (5) of Assnmp- 
tion 1, 

pi px* 

M{wi) = / Dh{x){l — Fx\w{x\u!i))dx — / Dh{x)Fx\w{x\u!i)dx 

J X* Jo 

pi px* 

< / Dh{x){l — Fx\w{x\w2))dx — Cf / Dh{x)Fx\w{x\w2)dx 

Jx* Jo 

= A 2 — CfB 2 

so that, together with M{w 2 ) = A 2 — B 2 , we obtain 

M{w 2 ) — M{wi) > {Cf — 1 )B2. (31) 

Similarly, by (4) and (6) of Assnmption 1, 

pi px* 

M{w2)= / Dh{x){l - Fx\w{x\'^2))dx - / Dh{x)Fx\w{x\'^2)dx 

J X* Jo 

pi px* 

>Cf Dh{x){l — Fx\w{x\wi))dx — / Dh{x)Fx\w{x\wi)dx 
Jx* Jo 

= CfAi - Bi 

so that, together with M{wi) = Ai — Bi, we obtain 

M{w2 ) — M{wi) > {Cf — l)Ai. (32) 
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In conclusion, equations (30), (31), and (32) yield 

M(w 2 ) - M{wi) > {Cf - l)(Ai + B2)/2 > {Cf - l)c//2 > 0. (33) 

Consider the case M{wi) > 0 and M{w 2 ) > 0. Then M{w 2 ) > M{w 2 ) — M{wi) and thus 


\M{w)\dw> / \M{w)\dw > {1 — W 2 )M{w 2 ) > — W 2 ){Cf — BjCf/2 > (34) 


' W2 


Similarly, 


\M{w)\dw > Wi{Cf — l)c//2 > 0 


(35) 


when M{wi) < 0 and M{w 2 ) < 0 . 

Finally, consider the case M{wi) < 0 and M{w 2 ) > 0. If M{w 2 ) > \M{wi)\, then 
M{w 2 ) > {M{w 2 ) — M{wi))/2 and the same argument as in (34) shows that 

f \M{w)\dw > {1 — W 2 ){Cf — l)cf/4:. 

Jo 

If \M{wi)\ > M{w 2 ), then |M(wi)| > {M{w 2 ) — M{wi))/2 and we obtain 

/ \M{w)\dw> / |M(w)|(it(; > t(;i(Ci 7 ’— l)c//4 > 0. 

Jo Jo 

This completes the proof of Case 1. 


Case II {x 2 < X*): Suppose M{wi) > —Cf/2. As in Case I, we have M{w 2 ) > CfAi — 
Bi. Together with M{wi) = Ai — i?i, this inequality yields 

M{w 2 ) — M{wi) = M{w 2 ) — CfM{wi) + CfM{wi) — M{wi) 

> {Cf - l)5i + {Cf - l)M{wi) 

= {Cf — 1 ) Dh{x)Fx\w{A'^i)J^ + M{wi) 

= {Cf - 1) \h{x)\fx\w{x\'^i)dx + M{wi) 

>{Cf- 1 )(^J \h{x)\fx\w{x\wi)dx-^ 

> (Cf - 1) (c/y" \h{x)\dx - > 0 

With this inequality we proceed as in Case I to show that \M{w)\dw is bounded 
from below by a positive constant that depends only on (. On the other hand, when 
M{wi) < —C//2 we bound \M{w)\dw as in (35), and the proof of Case II is complete. 
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Case III {x* < xi): Similarly as in Case II, suppose first that M{w 2 ) < c// 2 . As in 
Case I we have M{wi) < A 2 — CpB 2 so that together with M{w 2 ) = A 2 — - 82 , 


M{w 2) — M{wi) = M{w2) — CfM{w2) + CfM{w 2) — M{wi) 

> (1 — Cf)M{w2) + iCp — 1)^2 

= [Cp - 1) C / Dh{x){l - Fx\w{x\'^2))dx - M{w2) ] 




= {Cp - 1) \^j h{x)fx\w{x\w 2 )dx - M{w 2 )j 
> {Cp - 1) (^j h{x)fx\w{x\w2)dx - M{w2 ^ 


rx 2 


> {Cp - 1) ( c/ / h{x)dx - y ) = 


Cf\ _ {Cp - l)cf 


> 0 


and we proceed as in Case I to bound \M{w)\dw from below by a positive constant 
that depends only on (. On the other hand, when M{w 2 ) > c//2, we bound \M{w)\dw 
as in (34), and the proof of Case III is complete. The lemma is proven. Q.E.D. 


Proof of Corollary 1. Note that since T(a') < T{a") whenever a' < a", the claim for 
a < 0, follows from r(a) < r(0) < C, where the second inequality holds by Theorem 1. 
Therefore, assume that a > 0. Fix any a G (0,1). Take any function h G ^/(a) such 
that \\h\\ 2 ^t = 1- Set h\x) = ax for all x G [0,1]. Note that the function x h-)■ h{x) + ax 
is increasing and so belongs to the class M.. Also, ||h'|| 2 ,t < ||h '||2 < a/y/S. Thus, the 
bound (36) in Lemma 3 below applies whenever (1 + C'||T|| 2 )a/A /3 < a. Therefore, for all 
a satisfying the inequality 

y/Sa 

1 + ( 7 IITII 2 ’ 

we have r(a) < C/{1 — a). This completes the proof of the corollary. Q.E.D. 


Lemma 3. Let Assumptions 1 and 2 he satisfied. Consider any function h G T^[0,1]. If 
there exist h' G T^[0,1] and a G (0,1) such that h + h' G Ai and ||h'|| 2 ,t + C'||T|| 2 ||h '||2 < 
a||h|| 2 ,t, then 

IIAIIv < PP\\Thh (36) 

1 — a 

for the constant C defined in Theorem 1. 


Proof. Dehne 


h{x) 


h{x) + h'{x) 


X G [0,1]. 
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By assumption, ||/i'|| 2 ,t < ||^|| 2 ,t, and so the triangle inequality yields 


2,t > 


2i 




2,t 


2i 


-\m\ 


2,t 


1 . 


Therefore, since h E A4, Theorem 1 gives 


\\Thh > \\hh,/C > 1/C. 


Hence, applying the triangle inequality once again yields 

\\Thh > mkt - Wh'hMThh - WTh'h > {\\hh,t - Wh'hMThh - WThWh'h 
^ I|/^ll2,-||h'i|2, \\hh,f^ ||/^'||2, + C||T|h||/^'|h^ 

- C ^ ) 

Since the expression in the last parentheses is bounded from below by 1 —a by assumption, 
we obtain the inequality 

lir/ilb > ’^llftllu. 

which is equivalent to (36). Q.E.D. 


B Proofs for Section 3 

In this section, we use C to denote a strictly positive constant, which value may change 
from place to place. Also, we use E„[-] to denote the average over index z = 1 ,..., n; for 
example, E„[Xi] = n~^ AT*. 

Proof of Lemma 1. Observe that if Dg^{x) > 0 for all x G [0,1], then ^ coincides with 
'g^, so that to prove (14), it suffices to show that 

p(^D'^{x) > 0 for all x G [0, l]j —)■ 1 as u —>■ cx). (37) 

In turn, (37) follows if 

sup \Dg^{x) - Dg{x)\ = Op(l) (38) 

a;e[0,l] 

since Dg{x) > Cg for all x G [0,1] and some Cg > 0. 

To prove (38), dehne a function m G L^[0,1] by 

m{w) = q{w)''Ejn[q{Wi)Yif w G [0,1], (39) 

and an operator T : L^[0,1] —)• T^[0,1] by 

{fh){w) = q{wyEMWi)p{X^y]nPiU)h{U)], we[ 0,1], he L^[0, 1]. 
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Throughout the proof, we assume that the events 


E4g(fT,)p(X,)'] - E[g(lT)p(X)']|| < Ciejogn/nf/^ (40) 

- E[g(hE)g(PE)']|| < Ciejogn/nf'^ (41) 

^n[qm)gn{X^)] - nq{W)g^{X)]\\ < C'(J/(au))^2, (42) 

fh - m\\2 < C{{J/{an)Y^‘^ + r“V“^) (43) 


hold for some sufficiently large constant 0 < C < oo. It follows from Markov’s inequality 
and Lemmas 4 and 10 that all four events hold jointly with probability at least 1 — a — 
since the constant C is large enough. 

Next, we derive a bound on \\g^ — gn\\ 2 - By the definition of r„, 

||?“-^7n||2<r„||T(^-^7„)||2 

< Tr^wnr - ^ 7)||2 + ruling - gn )\\2 < TnWnT -9)h + C,K-^ 

where the second inequality follows from the triangle inequality, and the third inequality 
from Assumption 6(iii). Next, since m = Tg, 

mr - 9)h < ii(^ - Tn)r \\2 + ii(t„ - f)r \\2 + \\fr - + nm - my 

by the triangle inequality. The bound on ||m —m ||2 is given in (43). Also, since ||^||2 < Cfo 
by construction, 

\\{T-Tyr\\2<aCaT-^K-^ 
by Assumption 8(ii). In addition, by the triangle inequality, 

||(T„ - f)r\\2 < \\{Tn - f){r - 9n)\\2 + \\{Tn - f)<7„i|2 

<||r„-f||2i|r-^ni|2+||(T„-f)^J|2. 

Moreover, 

i|r„ - f II 2 = \\E,UWi)pi.Xt)’] - E|«(H^)p(.Y)']|| < C(f2 logn/n)'-'2 

by (40), and 

||(T.-f)^.||2 = ||E„[g(IE,)(7„(X,)]-E[g(IE)(/„(X)]|| <C(J/(«n))V2 


by (42). 

Further, by Assumption 2(iii), all eigenvalues of E[g(IE)g(IE)'] are bounded from below 
by Cw and from above by C^, and so it follows from (41) that for large n, all eigenvalues 
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of Qrt Ein[<ii^Vi)<]i^Viy] are bounded below from zero and from above. Therefore, 


\\TT - mile = l|E„[<;(M.'j)(p(A'j)'J“ - y,)]|| 

< C||E„|(Ki - p(Xi)’nq(Wi)']Q-'EMV'Um “ p(Ai)T)]ir'" 

< Cl|E„|(y, -p(A,)'/?„)g(M/,)']0;‘E„|,(M/,)(y; -p(A'i)'^)„)l||‘/= 

< ci|E„|j(W',)(p(A',)';3„ - y;)||| 

by optimality of /3“. Moreover, 

||E4g(lV,)(Pra'/^n-b'z)]|| < \\{T-TM2 + \\{Tn-T)g,,\\2 

+ \\T{gn - g)h + 

by the triangle inequality. The terms ||(T — Tn)gn \\2 and \\m — m \\2 have been bounded 
above. Also, by Assumptions 8(ii) and 6(iii), 

||(T„ - T)gr .\\2 < Ct-^K-% \\T{g - gn )\\2 < CgT-^R-^ 

Combining the inequalities above shows that the inequality 

||?“-^n||2 < C'(^r„(J/(an))^/2 +A:"" + r„(^^logn/n)^/^||^-^„||2) (44) 

holds with probability at least 1 — a — n~'^. Since T^(^\ogn/n —)■ 0, it follows that with 
the same probability, 

ll^“ - M = \\T - Qnh < c(r„(J/(an))'/2 ^ ^ 

and so by the triangle inequality, 

\Dr{x) - Dg{x)\ < \Dr{x) - Dg^{x)\ + \Dg^{x) - Dg{x)\ 

<C sup ||T)p(a:)||(r„(A'/(an))^/^ + iC"*) + o(l) 

a:e[ 0 ,l] 

uniformly over x G [0,1] since J < CjK by Assumption 5. Since by the conditions of the 
lemma, sup,j,g[o^i] ||T)p(a;)||(rn(A'/n)^/^ + K~^) —>■ 0, (38) follows by taking a = «„—)■ 0 
slowly enough. This completes the proof of the lemma. Q.E.D. 

Proof of Theorem 2. Consider the event that inequalities (40)-(43) hold for some suffi¬ 
ciently large constant C. As in the proof of Lemma 1, this events occurs with probability 
at least 1 — a — n“^. Also, applying the same arguments as those in the proof of Lemma 
1 with ^ replacing ^ and using the bound 

ii(r„ - f)rh < \\Tn - fhwrh < aiiT„ - 
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instead of the bound for \\{Tn — T )^||2 used in the proof of Lemma 1, it follows that on 
this event, 

\\T&-an)h < c{(KHan))'/^ + (ejoanln)'/^ + t-^K-). (45) 

Further, 

lir - 9nh., < i + ||T(g« _ jJIIj 

since ^ is increasing (indeed, if ||^ — g\\ 2 ,t < the bound is trivial; otherwise, ap¬ 
ply the dehnition of Tn,t to the function — 5'n)/||?^ — 9 n\\ 2 ,t and use the inequality 
Tn,t{\\Dgn\\oo/W “ 9 n\\ 2 ,t) < (||-Dfi'niloo/^))• Finally, by the triangle inequality. 


11 ?" - 9\\2,t <W- 9n\\2,t +\\9n- 9h,t < W - 9n\\2,t + CgR- 


Combining these inequalities gives the asserted claim (15). 

To prove (16), observe that combining (45) and Assumption 6(iii) and applying the 
triangle inequality shows that with probability at least 1 — a — n~^, 

\\T{T - 9 )h< + (^^ logn/n)^/^ + , 

which, by the same argument as that used to prove (15), gives 


Iir-^l|2,<q^ + r 


s 




(K , il\ogn\^/‘^ 


V-^ 

\an 


n 




+ /r 


(46) 


The asserted claim (16) now follows by applying (15) with 5 = 0 and (46) with 6 = 
ll-Dfi'lloo/cr and using Corollary 1 to bound This completes the proof of the theorem. 

Q.E.D. 


Lemma 4. Under conditions of Theorem 2, \\m — m \\2 < C{{J /^) with 
probability at least 1 — a where m is defined in (39). 


Proof. Using the triangle inequality and an elementary inequality (a -|- 6)^ < 2a^ -|- 25^ for 
all a, 6 > 0, 


\\E4qiW,)Y^- E[q{W)g{X)]f <2\\E4q{W,)e,]f + 2\\E4qiWMX^)]-E[qm9{X)]f. 


To bound the first term on the right-hand side of this inequality, we have 

E [||E„l4r(W';)ej]||2] = n-'El||,(^E)£||2] < (CB/ii)El||,(W0f | < CJ/n 

where the hrst and the second inequalities follow from Assumptions 4 and 2, respectively. 
Similarly, 

E [WEnUWMm - E[<;(M.')g(X)]||2] < n-'E[||,(l4')9(A')f ] 

<(CB/n)E[||g(W')ltl<C.//n 
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by Assumption 4. Therefore, denoting fhniw) := q{w)''Ei[q{W)g{X)] for all w G [0,1], we 
obtain 

E[||m — m„|| 2 ] < CJ/n, 

and so by Markov’s inequality, ||m —m „||2 < C'(J/(an))^/^ with probability at least 1 — a. 
Further, using 7 „ G from Assumption 7, so that rriniw) = q{w)''-)n for all w G [0,1], 
and denoting Vniw) := m{w) — rriniw) for all ta G [0,1], we obtain 


1 



rriniw) = qiw)' / qit)gix)fx,wix,t)dxdt 


'0 Jo 

-1 


= Qiw)' / qit)mit)dt = qiw)' / g(t)(g(t)'7n + rnit))dt 
Jo Jo 

= Qiw)''yn + qiw)' / qit)rnit)dt = m(ta) - r„(ra) + qiw)' / 
Jo Jo 

Hence, by the triangle inequality. 


Irrin - mh < llr^lU + 


qit)rnit)dt 


-1 J-S 


< 2 ||r„||2 < 2CmTn^J 


by Bessel’s inequality and Assumption 7. Applying the triangle inequality one more time, 
we obtain 


||m — m\\2 < \\m — m„|| + ||m„ — m||2 < CHJ /ian))^^'^ + ^) 

with probability at least 1 — a. This completes the proof of the lemma. Q.E.D. 

Proof of Corollary 2. The corollary follows immediately from Theorem 2. Q.E.D. 


C Proofs for Section 4 


Let Ad-i- be the set of all functions in JC[ that are increasing but not constant. Similarly, 
let jCti be the set of all functions in Ad that are decreasing but not constant, and let Ad_> 
be the set of all constant functions in Ad. 


Proof of Theorem 3. Assume that g is increasing but not constant, that is, g G Ad^i-. 
Dehne M(w) := E[F|hE = w], ta G [0,1]. Below we show that M G Ad^. To prove it, 
observe that, as in the proof of Lemma 2, integration by parts gives 


Miw) = ^( 1 ) 


Dgix)Fx\wix\w)dx, 
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and so Assumption 1 implies that M is increasing. Let us show that M is not constant. 
To this end, note that 


M{w 2) - M{wi) = f Dg{x){Fx\w{x\wi) - Fx\w{x\w2))dx. 

Jo 

Since g is not constant and is continuously differentiable, there exists x G (0,1) such that 
Dg{x) > 0. Also, since 0 < a:i < X 2 < 1 (the constants xi and X 2 appear in Assumption 
1), we have x G ( 0 , 0 : 2 ) or x G (xi, 1). In the first case. 


l'X2 


M{W 2 ) — M(wi) > / {Cf — l)Dg{x)Fx\w{x\w2)dx > 0. 


In the second case. 


M{w 2 ) — M{wi) > f {Cp — l)Dg{x){l — Fx\w{x\wi))dx > 0 . 

J Xi 

Thus, M is not constant, and so M G Similarly, one can show that if (7 G Adj,, 

then M G Ad 4 ,, and if 77 G Ad_,., then M G Ad_>. However, the distribution of the triple 
iY,X,W) uniquely determines whether M G Ad^-, Ad 4 ,, or Ad-,., and so it also uniquely 
determines whether g G Ad 4 -, Ad 4 ,, or Ad_^ This completes the proof. Q.E.D. 


Proof of Theorem 4- Suppose g' and g" are observationally equivalent. Then \\T{g' — 
9")\\2 = 0. On the other hand, since 0 < ||h|| 2 ,t + C'||T|| 2 ||h ||2 < \\g' — g"\\ 2 ,t, there 
exists a G (0,1) such that \\h\\ 2 ,t + C'||T|| 2 ||h ||2 < a\\g' — g"\\ 2 ,t- Therefore, by Lemma 3, 
11 ^( 17 ^ ~ 17^0112 > 1177^ ~ 77 ^ 1120(1 — cPjjC > 0, which is a contradiction. This completes the 
proof of the theorem. Q.E.D. 


D Proofs for Section 5 


Proof of Theorem 5. In this proof, c and C are understood as sufficiently small and large 
constants, respectively, whose values may change at each appearance but can be chosen 
to depend only on cw, Cw, Ch, Cr, cp, (7^, q, and the kernel K. 

To prove the asserted claims, we apply Corollary 3.1, Case (E.3), from CCK conditional 

on>V„ = {iyi,...,iyj. Under i/o, 


T < max 

{x,W,h)GXnXWnXBr 


Er=i hhHiHXi < x} - Fx\w{x\Wi)) 




1/2 


= :Tn 


(47) 


with equality if the functions w i— )■ Fx\w{x\w) are constant for all x G (0,1). Using the 
notation of CCK, 


To = max —= 
i<i<p \/n 


i=l 
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where p = \Xn ^ hVn x i3„|, the number of elements in the set Xn x Wn x Xij = 
ZijSij with Zij having the form y/nki^h{w)/^ and Sij having the form 
l{^i < a;} — Fx|vi/(a;|hh’i) for some (x, ta, h) E Xn x Wn x Bn- The dimension p satishes 
logp < Clogn. Also, = 1- Further, since 0 < l{Xi < x} < 1, we have 

\eij\ < 1, and so E[exp(|£jj|/2)|>Vn] < 2. In addition, E[£^^|Wn] > Ce(l — Ce) > 0 by 
Assumption 12. Thus, Tq satishes the conditions of Case (E.3) in CCK with a sequence 
of constants Bn as long as \zij\ < Bn for all j = In turn. Proposition B.2 

in Chetverikov ( 2012 ) shows that under Assumptions 2 , 9, and 10, with probability at 
least 1 — Cn~^, Zij < C/y/hYYa ='■ Bn uniformly over all j = 1,... ,p (Proposition B.2 in 
Chetverikov (2012) is stated with “w.p.a.l” replacing “1 —however, inspecting the 
proof of Proposition B.2 (and supporting Lemma H.l) shows that the result applies with 
“1 — Cn~^” instead of “w.p.a.l”). Let Bi^n denote the event that \zij\ < C/y/h^Yn = Bn 
for all j = 1 ,... ,p. As we just established, P(i3i,„) > 1 — Cn~^. Since (logn)^/(n/i min ) < 
Chn~^'^ by Assumption 10, we have that i?^(logn)^/n < Cn~'^, and so condition (i) of 
Corollary 3.1 in CCK is satished on the event Bi^n- 
Let B 2 ^n denote the event that 

P ( max \Fx\w{x\^) - Fx\w{x\^)\ > . 

\(3;,-u))eA’„xWn J 

By Assumption 11, P(i 32 ,n) > 1 — CFn~^^. We apply Corollary 3.1 from CCK conditional 
on Wn on the event Bi^n H B 2 ,n- For this, we need to show that on the event 132 ,n, 
Ci,n-\/logn + C 2 ,n < Cn~^ where Ci,n and (^ 2 ,n are positive sequences such that 

P (Pe(|T' - To'l > Cl,n) > C2,n|Wn) < C2,n (48) 


where 


rjih _ 

-'o •“ 


max 

{x,W,h)£XnXWnXS, 


Er=i {ki,h{w){l{Xi <x}- Fx\w{x\Wi))) 


(EUKiMX" 

and where Pe(-) denotes the probability distribution with respect to the distribution of 
ei,... ,en and keeping everything else hxed. To hnd such sequences (Ci,n and C 2 ,n, note 
that Cl,n A/log n + C2,n < Cn~^ follows from Ci,n + ( 2,71 < Cn~^ (with different constants 
c, C > 0), so that it suffices to verify the latter condition. Also, 

Etl(^^khH{Fxlw{xm) - Fx\w{xm)) 


IT'’ - T”| < max 

{x,W,h)&Xr,XWnXBn 


(Er=i^./AH 


p /2 


For hxed hFi,..., IFn and Xi,... ,Xn, the random variables under the modulus on the 
right-hand side of this inequality are normal with zero mean and variance bounded from 
above by maxp,u,)6A'„xw„ \Fx\w{x\'>^) - Fx\w{x\wW- Therefore, 


|T'’-To'|>C^ 


n max 

{x,w)&XnXWn 




x\w] 


Xiw 


X w 


< Cn 
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Hence, on the event that 


max 

{x,w)eXnXWri 


Fx\w{ 


x\w} 


Fx\w{ 


x\wi 




whose conditional probability given Wn on B 2 ,n is at least 1 — by the dehnition 

of B 2 ,n, 

Pe (IT'' - T^\ > Cn-^) < Cn-^ 

implying that (48) holds for some Ci,n and C 2 ,n satisfying „ + C 2 ,n < Cn~^. 

Thus, applying Corollary 3.1, Case (E.3), from CCK conditional on {Wi ,..., Wn} on 
the event fl B 2 ,n gives 


a - Cn-" < P(To > c(a)|W„) < a + CW. 


Since P(i3i,n fl B 2 ,n) > 1 — Cn~'^, integrating this inequality over the distribution of 
Wn = {fPi,..., ITn} gives (25). Combining this inequality with (47) gives (24). This 
completes the proof of the theorem. Q.E.D. 


Proof of Theorem 6. Conditional on the data, the random variables 


T'’(x, tc, h) := 

for (x, tc, h) E Xn X Wn X B 
above by 


Eti e* [hh{w){l{X, < x} - Fxiwixm))) 

„ are normal with zero mean and variances bounded from 


Eti {hh{w){l{X, < x} - Fxiw{xm))y 

Er=i hh{wy 

< max max < x} — Txivi/(3;|lTi)^ Fii^ + Chf 

(x,n),h)eV„xW„xB„ l<i<n V ' / 


by Assumption 11. Therefore, c{a) < C'(logn)^/^ for some constant C > 0 since c{a) is the 
(1 — a) conditional quantile of T'' given the data, T'' = h}ex„xWnxB„ {x, w, h), 

and p := \Xn x Wn X Bn\, the number of elements of the set An x Wn x Bn, satishes 
logp < Clogn (with a possibly different constant (7 > 0). Thus, the growth rate of 
the critical value c{a) satishes the same upper bound (logn)^/^ as if we were testing 
monotonicity of one particular regression function w i—)■ E[1{X < xo}|lP = w] with An 
replaced by Xq for some Xq G (0,1) in the dehnition of T and T''. Hence, the asserted 
claim follows from the same arguments as those given in the proof of Theorem 4.2 in 
Chetverikov (2012). This completes the proof of the theorem. Q.E.D. 
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E Technical tools 


In this section, we provide a set of technical results that are used to prove the statements 
from the main text. 


Lemma 5. Let W be a random variable with the density function bounded below from zero 
on its support [0,1], and let M : [0,1] —)■ M 6e a monotone function. If M is constant, 
then covfW, M(W)) = 0. If M is increasing in the sense that there exist 0 < tci < tC 2 < 1 
such that M{wi) < M{w 2 ), then coviyV,M{W)) > 0. 

Proof. The first claim is trivial. The second claim follows by introducing an independent 
copy W of the random variable W, and rearranging the inequality 


E[{M{W) - M{W')){W - W')] > 0, 


which holds for increasing M since {M{W) — M(W')){W — W) > 0 almost surely and 
{M{W) — M{W')){W — W) > 0 with strictly positive probability. This completes the 
proof of the lemma. Q.E.D. 

Lemma 6. For any orthonormal basis {hj,j > 1} in T^[0,1], any 0 < Xi < 0:2 < 1, and 
any a > 0, 

f rx2 \ 1/2 

\\hj\\ 2 ,t=yj h]{x)dxj >j-i/2-Q 

for infinitely many j. 


Proof. Fix M G N and consider any partition xi = to < ti < ■ ■ ■ < tM = X 2 . Further, £x 
m = 1,... ,M and consider the function 


h{x) = 




0 , 


X ^ (tm—li tm^ ■ 


Note that ||/i ||2 = 1, so that 


h = 5 :ftfe, ... L^lO. 1], ft := f-l f ' and ft= 

• 1 ^m—l) ■ 1 

7 = 1 ^ ' 7 = 1 


= 1 . 


Therefore, by the Cauchy-Schwarz inequality. 


i = Vft = 


^ ’ t -t , 

■ 1 ^m —1 


00 / /> 7 ' \ 2 00 /^-f 

/ pom \ _ Pom 

hj{x)dx\ < I {hj{x)Ydx. 


j=i 


i=i 


tm — 1 


j=\ ^m — l 


Hence, IIII 2 ft — Since M is arbitrary, we obtain ll^illift = ^1 so 

for any J, there exists j > J such that ||hj|| 2 ft > Otherwise, we would have 

Er=iii^iiiit<'^- This completes the proof of the lemma. Q.E.D. 
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Lemma 7. Let {X,W) be a pair of random variables defined as in Example 1. Then 
Assumptions 1 and 2 of Section 2 are satisfied if 0 < Xi < X 2 < I and 0 < Wi < W 2 < I- 

Proof. As noted in Example 1, we have 

X = + (1 - 


where <h(a;) is the distribution function of a A^(0,1) random variable and is a A^(0,1) 
random variable that is independent of W. Therefore, the conditional distribution func¬ 
tion of X given W is 


Fx\w{x\w) := $ 


$ ^(a:) — 


Since the function w h-)■ Fx\w{x\w) is decreasing for all x G (0,1), condition (4) of 
Assumption 1 follows. Further, to prove condition (5) of Assumption 1, it suffices to show 
that 

d\ogFx\w{x\w) 


dw 


< Cf 


(49) 


for some constant cp < 0, all x G (0, X 2 ), and all w G ( 101 , 102 ) because, for every x G (0, 0 : 2 ) 
and w G (^ 1 ,^ 2 ), there exists w G { 101 , 102 ) such that 


log 


Fx\w{x\wi) 

Fx\w{^\'>^2) 


= log Fx\w{x\iOi) - \ogFx\w{x\i02) = -{102 - lOi) 


dlogFx\w{x\w) 

dw 


Therefore, dlogFx\w(A '^)!< 0 for all x G ( 0 , 0 : 2 ) and w G (wi,W 2 ) implies 


Fx\w{A'^i) 


Fx\w{A'^2) 

for all X G ( 0 ,^ 2 ). To show (49), observe that 


> ^-cf{w2-w\) ^ 


dlogFx\w{x\'u^) ^ P (t){y) 1 ^ 0(l/) 

dw -p2 4)(j/)0(<h-i(M;)) “ - p 2 $(?/) 

where y := ($“^( 0 :) —p$“^(tc))/(l — pA^F_ Thus, (49) holds for some cp < 0 and all x G 
( 0 , 0 : 2 ) and w G {wi,W 2 ) such that 4>“^(o:) > p<h“^(tc) since 0:2 < 1 and 0 < tci < 0^2 < 1- 
On the other hand, when $“^( 0 :) < p$“^(rc), so that ?/ < 0, it follows from Proposition 
2.5 in Dudley (2014) that (j){y)/^{y) > { 2 /t:)^F^ amj so (50) implies that 

dlogFx\w{x\w) ^ 2 p 

dw - ^1 -p 2 

in this case. Hence, condition (5) of Assumption 1 is satished. Similar argument also 
shows that condition (6) of Assumption 1 is satished as well. 
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We next consider Assumption 2. Since W is distributed uniformly on [0,1] (remember 
that W ~ A^(0,1) and IV = <h(W)), condition (hi) of Assumption 2 is satisfied. Further, 
differentiating x h-)■ Fx\w{.A'^) gives 


fx\w{x\w) : = 


$ 




X) 






w) 






0(<F“^(x)) ’ 


(61) 


Since 0 < Xi < X 2 < 1 and 0 < Wi < tC 2 < 1, condition (ii) of Assumption 2 is satished 
as well. Finally, to prove condition (i) of Assumption 2, note that since fwi^o) = 1 for 
all w G [0,1], (51) combined with the change of variables formula with x = <F('r) and 
w = <F('u;) give 



1 ^1 



0 Jo 


fx\wi.A'^)dxdw 


1 


^H-OO /• + 00 


exp 


' —oo J —oo 


1 

1 + 


-dxdw 


x^ + 


2p 


L 2(1-p2 


X 


1 -p2 

4p 


xw — 


I + P‘ 


xw + w‘ 


1 - 2 


dxdw. 


dxdw 


Since 4p/(l + p^) < 2, the integral in the last line is hnite implying that condition (i) of 
Assumption 2 is satished. This completes the proof of the lemma. Q.E.D. 


Lemma 8. Let X = Ui + U 2 W where Ui,U 2 ,W are mutually independent, Ui,U 2 ~ 
17[0,1/2] and W ~ 17[0,1]. Then Assumptions 1 and 2 of Section 2 are satisfied if 
0 < wi < W 2 < 1, 0 < xi < X 2 < I, and wi > W 2 — 

Proof. Since X\W = tc is a convolution of the random variables Ui and U 2 W, 


»l/2 


fx\w{x\w)= I fuA^ - U 2 w)fufiu 2 )dU 2 


'0 

/■1/2 r n 

= 4/ l<0<x — U 2 W <21 *^“2 

= 4/ 1 <^- — <U2< — \dU2 

w 2w w \ 


'0 


= 


Ax 


2, 


0 < x < I 
^<x<h 


2(1+111) 


2 — ^ 2 
1 ^ ^ l+m 

^ , 2 — ^ 2 

0, ^<X<1 
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and, thus, 


2x^ 


Fx\w{x\w) = < 


2x-^, 


1 — - ix 

W \ 


l+w \ ^ 
2 ) 


0 < X < f 
f <x<i 

^ < X < 1 


It is easy to check that dFx\w{x\w)/dw < 0 for all x, tc G [0,1] so that condition (4) of 
Assumption 1 is satished. To check conditions (5) and (6), we proceed as in Lemma 7 
and show d\ogFx\w{x\w)/dw < 0 uniformly for all x G [x 2 ,xi] and w G {wi,W 2 ). First, 
notice that, as required by Assumption 2(iv), [xj,,Xfc] = [0,(1 + Wk)/2], k = 1,2. For 
0 < X < w/2 and w G {wi,W 2 ), 

dFx\w{x\w) —2x^lnP‘ 1 ^ ^ < 0 

dw 2x‘^/w w wi ’ 

and, for w/2 <x< 1/2 and w G {wi,W 2 ), 


dFx\w{x\w) ^ -1/2 ^ -1/2 ^ ^ Q 

dw 2x — w/2 w — w/2 wi 

Therefore, (5) holds uniformly over x G (x2,l/2) and (6) uniformly over x G (xi,l/2). 
Now, consider 1/2 < x < (1 + wi)/2 and w G {wi,W 2 )- Notice that, on this interval, 
d{Fx\w{A'^i)/P x\w{A'^‘2 ))/< 0 so that 

Fx\w{.A'^i) _ ^ ~ (^ ~ ^ _ 1 ___ W2 _^ ^ 

Fx\w{x\w2) 1 _ J_ fj; _ “ 1 _ J_ (LHa _ - 2[wi - W2y 

W2 \ 2/ W2 \ 2 2 ) 


where the last inequality uses wi > W 2 — \Jw2/2, and thus (5) holds also uniformly over 
1/2 < X < X 2 . Similarly, 

1 - Fx\w{x\F!2) _ ^ ~ ^ ^ ^ I 

1 - Fv|».(i|t5.) “ip- Vp)" ■ X (<ff ~ 

SO that (6) also holds uniformly over 1/2 < x < xi. Assumption 2(i) trivially holds. Parts 
(ii) and (hi) of Assumption 2 hold for any 0 < xi < X 2 < xi < 1 and 0 < tci < wi < 
W 2 < W 2 <1 with [x^, Xfc] = [0, (1 + Wk)/2], k = 1,2. Q.E.D. 


Lemma 9. For any increasing function h G T^[0,1], one can find a sequence of increasing 
continuously differentiable functions G L^[0,1], k > 1, such that \\hk — h \\2 —)■ 0 as 
fc —)■ oo. 
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Proof. Fix some increasing h G F^[0,1]. For a > 0, consider the truncated function: 

ha{x) := h{x)l{\h{x)\ < a} + al{h{x) > a} — al{h{x) < —a} 

for all X G [0,1]. Then \\ha — h\\2 — )■ 0 as a — )■ cxd by Lebesgue’s dominated convergence 
theorem. Hence, by scaling and shifting h if necessary, we can assume without loss of 
generality that h(0) = 0 and h{l) = 1. 

To approximate h, set h{x) = 0 for all x G M\[0,1] and for a > 0, consider the function 


h^{x) := - h{y)(j) 

(X .In 


y-x 

a 


dy = - 
a 


h{y)(f) 


y -X 
a 


dy 


for ?/ G M where (j) is the density function of a A^(0,1) random variable. Theorem 6.3.14 
in Stroock (1999) shows that 

\\ha - h \\2 = ( [ {K{x) - h{x)fdx\ 


< 


\ 1/2 

(ho-(x) — h{x)Ydx j —)■ 0 

) / 


as cr —)■ 0. The function h„ is continuously differentiable but it is not necessarily increasing, 
and so we need to further approximate it by an increasing continuously differentiable 
function. However, integration by parts yields for all x G [0,1], 


Dh^{x) = —\ [ h{y)D(j) 




= -- ( 


y-x 

a 


dy 


1 — X / —X , 

- h(O)0 — - 


a 


a 


y-x 

a 


dh{y) 


1 

>—/ 
a 


1 — X 


a 


hrT.^l'ix') 


since h(0) = 0, h(l) = 1, and (f((y — x)a)dh(y) > 0 hy h being increasing. Therefore, 
the function 

da(x) + (x/a)(f((l — x)/a), for x G [0, x] 
ha{x) + {x/o')(f){{l — x)/a), for x G (T, 1] 

dehned for all x G [0,1] and some x G (0,1) is increasing and continuously differentiable 
for all X G (0, l)\a;, where it has a kink. Also, setting x = Xa- = I — and observing 
that Q < h„{x) < 1 for all x G [0,1], we obtain 

\ 1/2 ^ ^ . N N / .1 N 1/2 


Ii/v.. - <H (^) U + (i + A (^)) 


0 


as cr —)■ 0 because cr i0(cr i/^) —)■ 0. Smoothing the kink of and using the triangle 
inequality, we obtain the asserted claim. This completes the proof of the lemma. Q.E.D. 
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Lemma 10. Let {p[, q[y,..., q^)' be a sequence of i.i.d. random vectors where pi’s are 

vectors in and qi’s are vectors in Assume that ||pi|| < ||gi|| < ||E[pip'^]|| < 

Cp, and ||E[gig']^] II < Cq where fn > 1- Then for all t >0, 


Anf^ 



^n(l + t) 


where A > 0 is a constant depending only on Cp and Cq. 

Remark 21. Closely related results have been used previously by Belloni, Chernozhukov, 
Chetverikov, and Kato (2014) and Chen and Christensen (2013). 

Proof. The proof follows from Corollary 6.2.1 in Tropp (2012). Below we perform some 
auxiliary calculations. For any a G and b G 


a'E[piq[]b = E[{a'pi){b'qi)] 

< (El(a'pi)"|El((-V)"l)‘''' < l|a||||t||(C,C,)‘-'= 


by Holder’s inequality. Therefore, ||E[pig(]|| < {CpCqY^'^. Further, denote Si := piq[ — 
E[piq'i\ for z = 1,..., n. By the triangle inequality and calculations above. 


l|5i||<|big;i| + ||E[pig(]|| 

< + (CpCp!^ < + (CpC,)'!^) =: R. 


Now, denote := Then 

||E[Z,,Z;]|| <n||E[^i^(]|| 


< n||E[pig(gip;]|| + n||E[pig'jE[gip;]|| < ni|E[pig(gip;]|| +nCpCq. 


For any a G M^, 


a'E\piq[qip'f\a < flE[{a'pif] < CWafCp. 


Therefore, ||E[pig(gip'J || < and so 


||E[Z„Z;]|| < nCpifl + Cq) < nil{l + Cp){l + Cq). 


Similarly, ||E[Z(jZ„]|| < n^^(l + Cp){l + Cq), and so 


:= max(||E[Z„Z;]||, ||E[Z;ZJ||) < <^(1 + Cp){l + Cq). 


Hence, by Corollary 6.2.1 in Tropp (2012), 



This completes the proof of the lemma. 


Q.E.D. 
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Example 1 


Example 2 
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Figure 1: Plots of Fx\w{.A'^i) Fx\w{.x\'^ 2 ) in Examples 1 and 2, respectively. 
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kx 

kw 




Model 1 



K, = 

1 

K = 

0.5 

K, = 

0.1 

uncon. 

con. 

uncon. 

con. 

uncon. 

con. 

0.1 

2 

3 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.021 

0.007 

0.005 

0.002 

0.000 

0.000 




MSE 

0.021 

0.009 

0.005 

0.002 

0.000 

0.000 




MSE ratio 


0.406 


0.409 


0.347 


2 

5 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.009 

0.004 

0.002 

0.001 

0.000 

0.000 




MSE 

0.009 

0.005 

0.002 

0.001 

0.000 

0.000 




MSE ratio 


0.529 


0.510 


0.542 


3 

4 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.026 

0.009 

0.005 

0.002 

0.000 

0.000 




MSE 

0.026 

0.009 

0.005 

0.002 

0.000 

0.000 




MSE ratio 


0.355 


0.412 


0.372 


3 

7 

bias sq. 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 




var 

0.013 

0.005 

0.003 

0.001 

0.000 

0.000 




MSE 

0.013 

0.005 

0.003 

0.001 

0.000 

0.000 




MSE ratio 


0.405 


0.486 


0.605 


5 

8 

bias sq. 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 




var 

0.027 

0.007 

0.005 

0.002 

0.000 

0.000 




MSE 

0.027 

0.007 

0.005 

0.002 

0.000 

0.000 




MSE ratio 


0.266 


0.339 


0.411 

0.7 

2 

3 

bias sq. 

0.001 

0.020 

0.000 

0.005 

0.000 

0.000 




var 

0.857 

0.097 

0.263 

0.024 

0.012 

0.001 




MSE 

0.857 

0.118 

0.263 

0.029 

0.012 

0.001 




MSE ratio 


0.137 


0.110 


0.101 


2 

5 

bias sq. 

0.001 

0.015 

0.000 

0.004 

0.000 

0.000 




var 

0.419 

0.080 

0.102 

0.020 

0.004 

0.001 




MSE 

0.420 

0.095 

0.102 

0.024 

0.004 

0.001 




MSE ratio 


0.227 


0.235 


0.221 


3 

4 

bias sq. 

0.001 

0.016 

0.000 

0.004 

0.000 

0.000 




var 

0.763 

0.104 

0.223 

0.026 

0.010 

0.001 




MSE 

0.763 

0.121 

0.223 

0.030 

0.010 

0.001 




MSE ratio 


0.158 


0.133 


0.119 


3 

7 

bias sq. 

0.001 

0.011 

0.000 

0.003 

0.000 

0.000 




var 

0.350 

0.083 

0.104 

0.020 

0.004 

0.001 




MSE 

0.351 

0.094 

0.104 

0.023 

0.004 

0.001 




MSE ratio 


0.267 


0.218 


0.229 


5 

8 

bias sq. 

0.001 

0.011 

0.000 

0.003 

0.000 

0.000 




var 

0.433 

0.094 

0.131 

0.023 

0.006 

0.001 




MSE 

0.434 

0.105 

0.131 

0.025 

0.006 

0.001 




MSE ratio 


0.243 


0.193 


0.170 


Table 1: Model 1: Performance of the unconstrained and constrained estimators for 
AT = 500, p = 0.3, ri = 0.3. 
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kx 

kw 




Model 2 



K, = 

1 

K = 

0.5 

K, = 

0.1 

uncon. 

con. 

uncon. 

con. 

uncon. 

con. 

0.1 

2 

3 

bias sq. 

0.001 

0.002 

0.000 

0.001 

0.000 

0.000 




var 

0.024 

0.003 

0.007 

0.001 

0.000 

0.000 




MSE 

0.024 

0.006 

0.007 

0.001 

0.000 

0.000 




MSE ratio 


0.229 


0.201 


0.222 


2 

5 

bias sq. 

0.001 

0.002 

0.000 

0.000 

0.000 

0.000 




var 

0.010 

0.002 

0.002 

0.001 

0.000 

0.000 




MSE 

0.011 

0.004 

0.002 

0.001 

0.000 

0.000 




MSE ratio 


0.405 


0.475 


0.446 


3 

4 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.022 

0.003 

0.006 

0.001 

0.000 

0.000 




MSE 

0.022 

0.004 

0.006 

0.001 

0.000 

0.000 




MSE ratio 


0.192 


0.176 


0.157 


3 

7 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.009 

0.002 

0.002 

0.001 

0.000 

0.000 




MSE 

0.009 

0.003 

0.002 

0.001 

0.000 

0.000 




MSE ratio 


0.325 


0.292 


0.323 


5 

8 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 




var 

0.014 

0.003 

0.003 

0.001 

0.000 

0.000 




MSE 

0.014 

0.004 

0.003 

0.001 

0.000 

0.000 




MSE ratio 


0.269 


0.268 


0.217 

0.7 

2 

3 

bias sq. 

0.002 

0.005 

0.001 

0.001 

0.000 

0.000 




var 

1.102 

0.032 

0.321 

0.008 

0.012 

0.000 




MSE 

1.104 

0.038 

0.321 

0.009 

0.012 

0.000 




MSE ratio 


0.034 


0.029 


0.032 


2 

5 

bias sq. 

0.001 

0.006 

0.000 

0.002 

0.000 

0.000 




var 

0.462 

0.031 

0.103 

0.008 

0.004 

0.000 




MSE 

0.463 

0.037 

0.104 

0.009 

0.004 

0.000 




MSE ratio 


0.080 


0.088 


0.088 


3 

4 

bias sq. 

0.001 

0.004 

0.000 

0.001 

0.000 

0.000 




var 

0.936 

0.036 

0.255 

0.009 

0.012 

0.000 




MSE 

0.936 

0.040 

0.255 

0.010 

0.012 

0.000 




MSE ratio 


0.043 


0.039 


0.034 


3 

7 

bias sq. 

0.001 

0.005 

0.000 

0.001 

0.000 

0.000 




var 

0.387 

0.035 

0.110 

0.009 

0.004 

0.000 




MSE 

0.388 

0.040 

0.110 

0.010 

0.004 

0.000 




MSE ratio 


0.103 


0.089 


0.092 


5 

8 

bias sq. 

0.002 

0.005 

0.000 

0.001 

0.000 

0.000 




var 

0.508 

0.041 

0.144 

0.010 

0.007 

0.000 




MSE 

0.510 

0.046 

0.144 

0.011 

0.007 

0.000 




MSE ratio 


0.090 


0.078 


0.065 


Table 2: Model 2: Performance of the unconstrained and constrained estimators for 
AT = 500, p = 0.3, ri = 0.3. 
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p 

T] 




Model 1 



K = 

1 

K = 

0.5 

K = 

0.1 

uncon. 

con. 

uncon. 

con. 

uncon. 

con. 

0.3 

0.3 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.026 

0.009 

0.005 

0.002 

0.000 

0.000 



MSE 

0.026 

0.009 

0.005 

0.002 

0.000 

0.000 



MSE ratio 


0.355 


0.412 


0.372 

0.3 

0.7 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.026 

0.008 

0.005 

0.002 

0.000 

0.000 



MSE 

0.026 

0.009 

0.005 

0.002 

0.000 

0.000 



MSE ratio 


0.342 


0.395 


0.449 

0.7 

0.3 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.025 

0.002 

0.003 

0.001 

0.000 

0.000 



MSE 

0.025 

0.003 

0.003 

0.001 

0.000 

0.000 



MSE ratio 


0.125 


0.248 


0.266 

0.7 

0.7 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.023 

0.002 

0.004 

0.001 

0.000 

0.000 



MSE 

0.023 

0.003 

0.004 

0.001 

0.000 

0.000 



MSE ratio 


0.136 


0.212 


0.259 


Table 3: Model 1: Performance of the unconstrained and constrained estimators for 
(Te = 0.1, kx = 3, kw = 4, iV = 500. 
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p 

T] 




Model 2 



K = 

1 

K = 

0.5 

K = 

0.1 

uncon. 

con. 

uncon. 

con. 

uncon. 

con. 

0.3 

0.3 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.022 

0.003 

0.006 

0.001 

0.000 

0.000 



MSE 

0.022 

0.004 

0.006 

0.001 

0.000 

0.000 



MSE ratio 


0.192 


0.176 


0.157 

0.3 

0.7 

bias sq. 

0.000 

0.001 

0.000 

0.000 

0.000 

0.000 



var 

0.020 

0.003 

0.006 

0.001 

0.000 

0.000 



MSE 

0.020 

0.004 

0.006 

0.001 

0.000 

0.000 



MSE ratio 


0.209 


0.163 


0.160 

0.7 

0.3 

bias sq. 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 



var 

0.013 

0.000 

0.002 

0.000 

0.000 

0.000 



MSE 

0.013 

0.001 

0.002 

0.000 

0.000 

0.000 



MSE ratio 


0.040 


0.063 


0.047 

0.7 

0.7 

bias sq. 

0.000 

0.000 

0.000 

0.000 

0.000 

0.000 



var 

0.010 

0.000 

0.002 

0.000 

0.000 

0.000 



MSE 

0.011 

0.001 

0.002 

0.000 

0.000 

0.000 



MSE ratio 


0.051 


0.060 


0.050 


Table 4: Model 2: Performance of the unconstrained and constrained estimators for 
(Te = 0.1, kx = 3, kw = 4, iV = 500. 
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Figure 3: Model 2: unconstrained and constrained estimates of g{x) for N = 500, p = 0.3, 
p = 0.3, (Te = 0.1, kx = 3, kw = 4. 
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conditional cdf of X|W 



Figure 4: Nonparametric kernel estimate of the conditional cdf Fx\w{x\w). 


lower income group 


middle income group 


upper income group 





Figure 5: Estimates oi g{x,zi) plotted as a function of price x for zi fixed at three income 
levels. 
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